Wikipedia hoaxes draw media attention: Bicholim conflict, Legolas2186
“
Up until a week ago, here is something you could have learned from Wikipedia:
From 1640 to 1641 the might of colonial Portugal clashed with India's massive Maratha Empire in an undeclared war that would later be known as the Bicholim Conflict. Named after the northern Indian region where most of the fighting took place, the conflict ended with a peace treaty that would later help cement Goa as an independent Indian state.
Except none of this ever actually happened. The Bicholim Conflict is a figment of a creative Wikipedian's imagination. It's a huge, laborious, 4,500 word hoax. And it fooled Wikipedia editors for more than 5 years.
”
This is how, on New Year's Day, the Daily Dotreported that a "massive Wikipedia hoax" had been exposed after more than five years. The article on the Bicholim conflict had been listed as a "Good Article" for the past half-decade, yet turned out to be an ingenious hoax.
Created in July 2007 by User:A-b-a-a-a-a-a-a-b-a, the meticulously detailed piece was approved as a GA in October 2007. A subsequent submission for FA was unsuccessful, but failed to discover that the article's key sources were made up. While the User:A-b-a-a-a-a-a-a-b-a account then stopped editing, the hoax remained listed as a Good Article for five years, receiving in the region of 150 to 250 page views a month in 2012. It was finally nominated for deletion on 29 December 2012 by User:ShelfSkewed—who had discovered the hoax while doing work on Category:Articles with invalid ISBNs—and deleted the same day. Of course, the Internet and Wikipedia being what they are, the article is still present on dozens of websites that had copied it from Wikipedia. It also remains included in a number of Wikipedia-based books available from Barnes & Noble.
The Daily Dot's report was quickly picked up by other publications: PC World, Yahoo News, then The Daily Mail, UPI and TechCrunch. Over the first two weeks of 2013, the story spread from publication to publication, from country to country, reaching all the way back to South Asia, where it was reported by the Times of India and the Indian Express, as well as Republika in Indonesia. Last of all, it arrived in Japan, with the Japanese TechCrunch site carrying a translation of the story.
The original article in the Daily Dot, written by chief reporter Kevin Morris, has to date received close to 1,000 tweets. On 18 January 2013, Morris followed up with another, far longer piece; titled "How vandals are destroying Wikipedia from the inside", it began with a review of the recent indefinite block of User:Legolas2186.
Legolas2186 was indefinitely blocked by administrator Georgewilliamherbert in the wake of the Bicholim conflict story, following a discussion at AN/I. Inactive since February 2012, he had in previous years written close to 100 GAs along with several FAs, including the Featured Article on Madonna. Subsequent sourcing investigations initiated by User:Binksternet however showed that Legolas2186 had an alarming tendency to falsify or invent quotes and sources, and the Madonna FA (promoted in 2010) was demoted as a result in 2012. It may be significant that Legolas2186 had received multiple warnings about adding unsourced information in 2009. As Morris said in the Daily Dot,
“
... like his parallels in news media, Jayson Blair and Stephen Glass, Legolas was weaving together a portfolio of success with a web of cleverly constructed lies, false sources, and invented quotes. An investigation led by an enterprising team of Wikipedia editors dug up dozens of fabrications perpetrated by Legolas, who was later banished from the site.
Legolas2186 is hardly the first hoaxster to fool Wikipedia. But his case shows the urgency with which the encyclopedia needs to modernize and adapt, as the editorial core it relies upon to fend off the Internet's unrelenting wave of trolls and liars grows ever smaller.
”
Morris consulted Doctor Charles Ford, a professor of psychology at the University of Alabama, to find out what might motivate a person to lie repeatedly. Emphasising that he was speaking generally, rather than about this specific editor, whom he did not know, Ford stated that compulsive lying is usually due to a learning disability, or narcissism. The ability to fool people might give a person an enhanced sense of power. Others, Ford said, genuinely feel that they are at the centre of the universe: "They then define what is real and not real."
Morris argues that Wikipedia's internal structure and communications tools are too decentralised and outdated, and that this "doesn't just slow down the discovery of hoaxes, it scares people away. And meanwhile, pranksters like Legolas strain the time the site's editors do have—all of which only exacerbates Wikipedia's unprecedented editorial crisis." While the number of articles has risen, the number of editors has dropped.
William Henderson on the Telegraph website chimed in on 23 January, explaining "Why we're about to discover more Wikipedia hoaxes". Henderson drew particular attention to the "tens or even hundreds of thousands of articles that no one is keeping an eye on".
Wikipedia, the people's encyclopedia, by Sue Gardner
The Los Angeles Times published an upbeat op-ed by Sue Gardner on 13 January 2013. Titled "Wikipedia, the people's encyclopedia", the piece celebrated the first 12 years of Wikipedia's existence, and the diversity of the more than 1.5 million people who have contributed to the Wikipedia project:
“
... An encyclopedia is one of humankind's grandest displays of collaborative effort, and Wikipedia takes that collaboration to new levels, with contributors from pretty much every ethnicity, nationality, socioeconomic background, political ideology, religion, sexual orientation and gender. The youngest Wikipedian I've met was 7, a boy in Tel Aviv who makes small edits to articles about animals and children's books. The oldest I've met was 73, a retired engineer who writes about the history of Philadelphia, where he's lived for half a century.
”
Gardner characterised Wikipedians as, "almost without exception, ... ridiculously smart, as you might expect of people who, for fun, write an encyclopedia in their spare time." Many of them are very young: "There's a recurring motif inside Wikipedia of preteen editors who've spent their lives so far having their opinions and ideas discounted because of their age, but who have nonetheless worked their way into positions of real authority on Wikipedia. They love Wikipedia fiercely because it's a meritocracy: the only place in their lives where their age doesn't matter."
Wikipedians are geeky, she said, and nine out of ten of them are male—Gardner's theory is it's because "some of the constellation of characteristics that combine to create a Wikipedian—geeky, tech-centric, intellectually confident, thick-skinned and argumentative, with the willingness and ability to indulge in a solitary hobby—tend to skew male." They also tend to live in affluent parts of the world.
Reviewing Wikipedia's strengths and weaknesses, Gardner stated that Wikipedia's fundamental ideals—neutrality, lack of judgment, verifiability—and many attentive eyes had made well-visited articles like the one on Obama neutral and accurate, while Wikipedia's articles on obscure topics were weakest—places "where subtle bias and small mistakes can sometimes persist for months or even years."
“
Since it was founded 12 years ago this week, Wikipedia has become an indispensable part of the world's information infrastructure. It's a kind of public utility: You turn on the faucet and water comes out; you do an Internet search and Wikipedia answers your question. People don't think much about who creates it, but you should. We do it for you, with love.
Wikipedia, the collaborative encyclopedia that's edited by you (if you're a dude), me (if I were a dude), and all the dudes you know, launched in 2001 and quickly became the place to find quick info on pretty much any topic under the sun. Remember writing research papers before Wikipedia? Man, we were all such chumps with our "books."
Despite being one of the most heavily visited sites on the web, women comprise just 9 percent of all Wikipedia editors.
”
The Daily Dot commented that according to researchers, Wikipedia's well-known gender gap is a "byproduct of established gender biases in society, the male-oriented aesthetics of technology, and Wikipedia's sometimes-abrasive culture. These factors have all coalesced to reinstitute a familiar pattern." This is all the more remarkable as there are many social media where women are actually in the majority.
Sarah Stierch said it's partly due to Wikipedia's software design, and its "cold, technical and argumentative" atmosphere: "It's aesthetically very masculine in its design. Its community, like so much of the early Internet, has been male dominated, and I think when a lot of people—men or women—look at Wikipedia these days, they see it as a source for information but have little interest or excitement in contributing to it." The traditional gender gap in higher education might also play a role, she added. "The average Wikipedia editor is a well-educated white male. Well-educated white males have been writing history and the story of the world since ancient times." Efforts to create a more inclusive community in Wikipedia would be helped if more women "came out" as women on the site, rather than staying gender-anomymous.
Joseph Reagle, whose study "'Free as in sexist?' Free culture and the gender gap" appeared recently in First Monday, warned, "The ideas of freedom and openness can be used to dismiss concerns and rationalize the gender gap as a matter of preference and choice. That is, 'if there are no women in our project, it must simply be their choice.' Women may have made a choice, but it was not based on whether they find the project interesting or have a contribution to make, but by the 'brogrammer' locker-room type of environment." According to Reagle, reducing the gap is important for Wikipedia as a whole: a male-dominated culture leads to more biased articles, and research has shown that the "collective intelligence of a group goes up with increased social sensitivity, conversational turn-taking, and female participation."
In brief
Presidential library gets Wikipedian in Residence: On 17 January 2013, the Chronicle of Higher Educationannounced the appointment of a University of Michigan student as the Wikipedian in Residence at the Gerald R. Ford Presidential Library: "Michael Barera, a master's student in Michigan's School of Information, has been selected for the new internship position and charged with increasing and enhancing the library's presence on Wikipedia." Michigan website AnnArbor.com initially stated that the position was paid, but later corrected its entry to state that "Barera is not being compensated for his work." The university's news service published its own announcement of the collaboration on 17 January, pointing to the WikiProject's page on Wikipedia, Wikipedia:WikiProject Gerald Ford.
Looking back at the SOPA blackout: The Boston Review published a retrospective on the 2012 SOPA blackout on 18 January 2013, "The Day Wikipedia Went Dark—Did It Save Internet Freedom?", arguing that "What the Wikipedia blackout teaches is that the preservation of the free Internet will rise or fall on the involvement and ingenuity of the people, not on courts or lawmakers. Wikipedia is a Web site, but it is also a community of thousands of volunteer writers around the world—so-called Wikipedians—who decided a year ago today to take a political stand for the first time in its existence. The decision ran counter to the site's apolitical stance and was in considerable tension with its overriding mission to spread free knowledge to the world. Nevertheless, they risked Wikipedia's position of neutrality, not to mention its reputation, to fight for the freedoms on the Internet they hold dear." The article concluded, "Even if the Internet blackout of 2012 is never repeated, it stands as an important lesson for generations to come: the Internet can't stop the next SOPA, but people can."
Mormonism Wikipedia articles: The Latter-Day Saints website Meridian Magazine—sporting a yellow fundraising banner remarkably similar to Wikipedia's own—published a set of two articles critiquing Wikipedia's biography of Mormon figure Martin Harris on 22 and 23 January 2013, titled "Wikipedia's Deconstruction of Martin Harris" and "Wikipedia Attacks Martin Harris' Faith". The author, Roger Nicholson, stated, "In an attempt to abide by the Wikipedia guidelines to be unbiased and represent all sides of a story, the representation of Martin Harris has gone awry. An unbalanced mixture of facts and details taken out of context, have painted a picture of a man almost unrecognizable to Mormons. A better understanding of the misused quotes and the history of the region, as well as a desire to see the bigger picture, brings Martin Harris back into focus." The articles followed up on an earlier piece by Nicholson, "The Gospel Online: Who Should Define Mormonism on Wikipedia?", published last month.
Lance Armstrong: On 24 January 2013, the Huffington Post published an article about the editing of the Lance Armstrong biography in the wake of the disgraced cyclist's interview with Oprah Winfrey, discussing his use of performance-enhancing drugs. Reviewing the to and fro the article had undergone since the doping scandal first hit the press, Huffington Post writer Sam Oakley concluded, "I find it quite comforting that this is the new way that history is being written. The number of edits to Armstrong's entry is slowing down and it seems that the apologists and the hardliners are reaching some sort of uneasy consensus that chronicles both his rise to, and fall from grace. In the place of a model where one or two historians decide how Armstrong is remembered we have what looks to have been a pretty well informed debate resulting in what looks to have been a pretty equitable solution."
Deletion nomination receives attention: The Sydney Morning Herald and the Age were among Australian newspapers reporting on 27 January 2013 that the article Death of Jill Meagher was at risk of deletion, noting that "Ms Meagher's death created global headlines and resulted in two peaceful protests in Brunswick that attracted more than 30,000 people." (The AfD has since been closed as keep.)
Consistent patterns found in Wikipedia and other open collaborations: In the introductory piece[1], researchers Andrea Forte and Cliff Lampe give an overview of this field, defined as the study of "distributed, collaborative efforts made possible because of changes in information and communication technology that facilitate cooperative activities" - with open source projects and Wikipedia among the most prominent examples. They point out that "[b]y now, thousands of scholars have written about open collaboration systems, many hundreds of thousands of people have participated in them, and millions of people use products of open collaboration every day." Among their "lessons from the literature", they name three "consistent patterns" found by researchers of open collaborations:
"Participation Is Unequal" (meaning that some participants contribute vastly more than others: "In Wikipedia, for example, it has long been shown that a few editors provide the bulk of contributions to the site.")
"There Are Special Requirements for Socializing New Users"
"Users Are Massively Heterogeneous in Both How and Why They Participate"
"Ignore All Rules" as "tension release mechanism": The abstract of paper titled "Rules and Roles vs. Consensus: Self-Governed Deliberative Mass Collaboration Bureaucracies" [2] explains "Wikipedia’s unusual policy, ignore all rules (IAR)" as a "tension release mechanism" that is "reconciling the tension between individual agency and collective goals" by "[supporting] individual agency when positions taken by participants might conflict with those reflected in established rules. Hypotheses are tested with Wikipedia data regarding individual agency, bureaucratic processes, and IAR invocation during the content exclusion process. Findings indicate that in Wikipedia each utterance matters in deliberations, rules matter in deliberations, and IAR citation magnifies individual influence but also reinforces bureaucracy."
Collaboration on articles about breaking news matures more quickly: "Hot Off the Wiki: Structures and Dynamics of Wikipedia's Coverage of Breaking News Events"[3] analyzes "Wikipedia articles about over 3,000 breaking news events, [investigating] the structure of interactions between editors and articles", finding that "breaking articles emerge into well-connected collaborations more rapidly than nonbreaking articles, suggesting early contributors play a crucial role in supporting these high-tempo collaborations." (see also our earlier review of a similarly-themed paper by the same team: "High-tempo contributions: Who edits breaking news articles?")
A fourth paper in this special issue, titled "The Rise and Decline of an Open Collaboration System: How Wikipedia’s Reaction to Popularity Is Causing Its Decline", found considerable media attention this month, starting with an article in USA Today. It was already reviewed in the September issue of the research report.
Mathematical model for attention to the promoted Wikipedia articles
While the size and growth rate, editorial workflow, and topical coverage of Wikipedia have been vastly studied, there is little work done on the understanding of public attention to the Wikipedia articles. In a working paper by a team from the Barcelona Media Foundation and the University of Twente, placed on arXiv just before Christmas,[4] the number of clicks on the featured articles promoted to the Wikipedia Main page is analysed and modeled.
A total of 684 featured articles are considered and the page view statistics of them is rescaled by the average circadian view rate extracted from a larger set of 871 395 articles in a period of 844 days. The 4-day lifetime of the promoted articles on the Main page is characterised by four phases. A very rapid growth in the number of article clicks just after the article appears on the Main page, followed by a rather homogeneous period of the first day of the promotion. As the article is replaced by a new featured article, and placed in the "recently featured" part of the Main page, the rate of clicks drops dramatically, and finally, the fourth flat phase is experienced during the remaining 3 days at this location.
In the next step, the authors introduce a rather intuitive model based on a few parameters to fully describe the 4 days cycle in a mathematical framework. The model is tuned based on the data of a set of 100 featured articles to predict the number of page hits for the rest of the sample, given the number of the clicks after the first hour of promotion for each article. The model is relatively accurate in predicting the number of clicks, and this accuracy could be even improved by feeding the model with the number of clicks at the end of the first day instead of the first hour after promotion. While the paper is very clear in describing the methodology, it fails to discuss and provide a deeper understanding of the social mechanisms of popularity and public attention, as it is mentioned repeatedly by the authors.
The featured article icon and other heuristics for students to judge article credibility
A paper in Information Processing and Management titled "College students’ credibility judgments and heuristics concerning Wikipedia"[5] used the theory of bounded rationality and a heuristic-systematic model to analyze American college students’ credibility judgments and heuristics concerning Wikipedia. Not surprisingly, authors observe that students used a heuristic (a mental shortcut, such as An article with a long list of references is more credible than with of a short one) in assessing the credibility of Wikipedia. Students (regardless of their knowledge) were much more likely to focus on the number of references than on their quality, and the same article would be seen as more credible depending on how many references it had. The authors conclude that educators need to teach students how to judge the quality of Wikipedia articles that goes beyond checking whether the article has references (and how many). The authors recommend that Wikipedia makes its own assessments (such as the Featured Article star, currently visible only as a small bronze star icon on the top right-hand corner of the article’s page) much more prominent. (This reviewer strongly agrees with the conclusion, but unfortunately the last community discussion appears to have achieved little.)
More interestingly, the authors also find that people with more knowledge found Wikipedia more credible, suggesting that people with low knowledge may be more uneasy with Wikipedia. The authors suggest that the reliability of Wikipedia would be increased if more professional associations implemented programs such as Association for Psychological ScienceWikipedia Initiative. In addition to getting the experts more involved in Wikipedia content creation, the authors suggest that a good idea may be for "professional associations themselves [to] provide their own endorsement for the quality of articles in their fields."
The authors also note that peer endorsement is an important factor in credibility, and that the Wikipedia:Article Feedback Tool is a step in the right direction, as it provided another credibility assessment for the readers. They note, however, that compared to similar tools implemented on other sites (such as Amazon), "Wikipedia readers need to click on ‘‘View Page Rating,’’ which requires one more step to find out that information. The average reader may not be inclined to do so. It would be useful to display ratings without clicking".
Briefly
“Free as in sexist?”: In a paper in this month's First Monday,[6] Joseph Reagle talks about the gender gap in free culture and free and open source software communities. Wikipedia is one of the case studies discussed, but Reagle makes valid observations that it is not so much an exception but a rule in this wider context.
Further criticism of "most influential people" infographic: On Ethnography Matters, a blog run by four ethnographers, one of the authors, Heather Ford, discussed [7] a Wired infographic on "History's most influential people, ranked by Wikipedia reach". Like the reviewer in our December issue, the author criticizes the infographic and the accompanying article for lacking any serious description of methodology. She notes that given those shortcomings, the claims made by the article are rather dubious, and the cited research might have well been misquoted. The author further notes that any research that attempts to draw conclusions about "national culture" from analyzing different language Wikipedias runs into a major issue, which is that languages don't always map easily onto national cultures (consider: what is the national culture of Portuguese or English?). She further illustrates this by discussing how often African-language Wikipedias are edited primarily by individuals living outside the country most often associated with a given language.
"Sustainability of Open Collaborative Communities:" In an article[8] published in the Technology Innovation management Review (based on a similar work presented at HICCS 2013 and reviewed in the October 2012 edition of WRN), Kevin Crowston, Nicolas Jullien, and Felipe Ortega present a preliminary comparison of the recruitment efficiency of 36 of the largest Wikipedias. The concept of recruitment efficiency refers to the ability of these Wikis to recruit editors from the total population of readers and potential contributors. The authors estimate this quantity using aggregated data on the number of Internet users and tertiary (college) educated speakers of each of the 36 languages that correspond to the Wikipedias included in their analysis. They find suggestive patterns in the results of this comparison, including: (1) Wikipedias of moderate and smaller size exhibit great variations in terms of their recruitment efficiency; and (2) larger Wikipedias appear less efficient in recruiting new members, suggesting a pattern of decreasing returns to scale. The authors conclude that these findings warrant further investigation and analysis, but that they provide preliminary support for the idea that larger Wikipedias face distinct conditions of community sustainability compared to smaller ones. See the extended summary from the October 2012 Wikimedia Research Newsletter for further details.
Language comparison algorithm finds new ghosts in England and Scotland: A paper by four Japanese researchers titled "Extracting lack of information on Wikipedia by comparing multilingual articles"[9] proposes "a method for extracting information that exists in one language version, but which does not exist in another language version." Their method uses various steps, starting from a users' search query in their native language Wikipedia, which is automatically translated (using a dictionary) to other "non-native" Wikipedias, and involves use of the link structure between articles, the section structure within one article, and finally the cosine similarity between the nouns of different articles - a low similarity score indicating that information from one article is missing from the other. A small-scale test brought some successes, e.g. the detection of examples in Black dog (ghost) from England and Scotland that were not present in the corresponding article on the Japanese Wikipedia, but also showed problems with the proposed algorithm. The four authors previously published a related paper titled "Extracting Difference Information from Multilingual Wikipedia", covered in the April edition of this research report.
Sentiment analysis of articles about politicians: Researcher Finn Årup Nielsen, who works on a project funded to do "Wikipedia sentiment analysis for companies", blogged about applying sentiment analysis to articles about politicians on the Danish Wikipedia.[10]
"Clustering Wikipedia infoboxes to discover their types": A paper[11] presented at the CIKM’12 conference describes a method to use infoboxes to detect the entity type of an article (e.g. "movie" for Avatar (2009 film)). The authors explain that Wikipedia's existing category system is not sufficient for this: "Because Wikipedia category names are folksonomic, i.e., they are created by a group of people without the control of a central authority, they are also an unreliable source for inferring the conceptual entity type." As example, the authors cite the article about (the 1981 film) Chariots of Fire, and argue that based on the "categories, a system like Yago would assign to the infobox concepts like film, winner, olympics, culture, university, and sport. However, only film corresponds to the entity described in the infobox." On the other hand, the naming of infoboxes (as templates) is not consistent enough either: "For example, the entity type Film is associated with template names Infobox Film, Infobox Movie, Television Film Infobox, TV film, James Bond film, Chinese film, infobox Korean film, etc." The algorithm described in the paper measures the similarity of different infoboxes based on their set of attributes: "For example, the attribute cluster discovered for the entity type Movie includes the attributes {Directed by, Produced by, Written by, Starring}." The authors report that their clustering algorithm, "WIClust", performed successfully on a sample of "48,000 infoboxes spanning 862 infobox templates", and that in some cases it corrects shortcomings of DBpedia, e.g. by discovering "that the templates Infobox Movie, Bond film, Japanese film, Chinese film, and Korean film belong to the same group as Infobox Film."
How Indic language Wikipedias fared in 2012: In a blog post, Indian Wikimedian Shiju Alex compared the article numbers, user activity levels and pageviews of Wikipedias in Indic languages between December 2011 and December 2012.[12]
Students detect vandalism: Three student projects in a course on machine learning at Stanford University concerned the automatic detection of vandalism edits on Wikipedia.[13]
"Algorithmic governance" in the German Wikipedia: Leonhard Dobusch, an assistant professor for organization theory at FU Berlin, blogged about an ongoing research project on the sighted revisions on the German Wikipedia as a case of "algorithmic governance".[14]
Map visualization of links between geotagged articles: The "Collaborative Cybernetics" blog published maps visualizing the links between Wikipedia articles containing geocoordinates[15] and the geographic distribution of certain topics[16] (example: skiing-related terms).
Map of sister cities extracted from Wikipedia: Four researchers from the Barcelona Media Foundation (three of whom also co-authored the paper on featured article pageviews reviewed above) published a preprint[17] where they "extracted the network of sister cites[sic] as reported on the English Wikipedia, as far as we know the most extensive but certainly not complete collection of this kind of relationships", and analyze the resulting social network, including a map visualization of worldwide twin city pairings.
New pageview files available: On his personal blog[18], Wikimedia Foundation data analyst Erik Zachte announced the release of "Monthly page requests, new archives and reports".
Upcoming book on "Global Wikipedia": A call for chapters has been issued for an upcoming book titled "Global Wikipedia: International and cross-cultural issues in online collaboration".
Is Wikipedia built on "good faith collaboration" or "destructive editing"? : In his 2010 MIT Press book about Wikipedia (now available online under a free license), Joseph Reagle posited that Wikipedia is based on a culture of “Good Faith Collaboration”. In his 2012 thesis at the University of Cambridge, titled "Destructive Editing and Habitus in the Imaginative Construction of Wikipedia",[20]User:Thedarkfourth argues against this, highlighting the importance of conflicts instead. This month, Reagle responded to the criticism on his blog,[21] asserting that it was based on "a new scholasticism. In this view, a work's contribution consists exclusively of interpreting an interesting phenomenon in the light of dead philosophers". Reagle argues that this view holds that scholars should look at what came before prior to explaining a new phenomenon; they should first refer to libraries and bibliographies before drawing a new hypothesis onto the whiteboard. He defends his position by arguing that his book is not guilty of ahistoricism, as Wallis seems to imply.
Notes
^Andrea Forte, Cliff Lampe: Defining, Understanding, and Supporting Open Collaboration: Lessons From the Literature doi:10.1177/0002764212469362 American Behavioral Scientist January 11, 2013
^Brian Keegan, Darren Gergle and Noshir Contractor: Hot Off the Wiki: Structures and Dynamics of Wikipedia's Coverage of Breaking News Events. American Behavioral Scientist. doi:10.1177/0002764212469367PDF
^Marijn ten Thij, Yana Volkovich, David Laniado, Andreas Kaltenbrunner: Modeling and predicting page-view dynamics on Wikipedia PDF
^Sook Lim: "College students’ credibility judgments and heuristics concerning Wikipedia". Information Processing & Management, Volume 49, Issue 2, March 2013, Pages 405–419 doi:10.1016/j.ipm.2012.10.004
^Fujiwara, Yuya and Konishi, Yukio and Suzuki, Yu and Nadamoto, Akiyo: "Extracting lack of information on Wikipedia by comparing multilingual articles"Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services doi:10.1145/2428736.2428808
^Nguyen, Thanh Hoang and Nguyen, Huong Dieu and Moreira, Viviane and Freire, Juliana: Clustering Wikipedia infoboxes to discover their types. Proceedings of the 21st ACM international conference on Information and knowledge management, CIKM '12 doi:10.1145/2396761.2398588
^Brett Kuprel. Detecting Bad Wikipedia Edits. PDF;
Mudit Jain, Murugan Ayyappan, Nikhil Agarwal. Detecting vandalisms in Wikipedia edits. PDF;
Tony Jin, Lynnelle Ye, Hanzhi Zhu. Detecting Wikipedia Vandalism. PDF
What motivated you to join WikiProject Chess? Do you play competitively? Have you contributed to any of the project's Featured or Good Articles?
I started working on chess articles shortly after I joined Wikipedia in 2004/2005. At the time, there were a number of good chess-related articles, but many were underdeveloped and many significant topics lacked articles altogether.
I am an active tournament player, of middling strength, although with a rather high (possibly inflated) FIDE rating of 1944. I am also quite active in the local chess club on Karmøy.
I have unfortunately not made any major contributions up to Good or Featured status on the English Wikipedia. It is editors such as User:Krakatoa and User:SyG who have done an admirable job in achieving that.
WikiProject Chess's primary article, chess, is one of the longest-serving Featured Articles, originally designated "Refreshing Brilliant Prose" back in 2002 and never being demoted, even as it underwent multiple Featured Article Reviews. How difficult has it been to keep this article up to Wikipedia's changing standards over the years?
I haven't participated in this process, but I remember that there were serious challenges to the FA statuses in 2006 and 2010 where some work had to be done with it. There have also been people who want to add much more, and in most editors' views excessive, coverage of India's role in the history section.
The chess article is reasonably stable because the rules, strategy, and cultural aspects are fairly constant. What does need updating from time to time is who is in the World top right now. For instance there was a milestone this month when Magnus Carlsen became the World's highest rated player in the history of the rating system.
Are there any gaps in Wikipedia's coverage of chess's history, strategy, or notable players? Are any countries or generations better represented than others?
We are missing articles on a number of grandmasters. These players have always been considered notable enough for articles. I have tried to make sure that we have articles for all the Norwegian grandmasters (and for the more recent Norwegian national champions as well), and I think we have complete coverage of all the American and English grandmasters too. But we are missing articles on many Russian grandmasters for example. It really is a function of whether any editor wants to take the time it is to create the biographies. For the top flight of players, those over 2700 in rating, I think our coverage is quite comprehensive. This does not mean the articles couldn't be improved, but at least the players at that level have biographies.
The chess literature on strategy is overwhelmingly devoted to openings, and we have a large number of articles related to different chess openings. Sometimes an excited user decides to write up an article on an extremally obscure opening, but there the line is usually drawn. There are some articles that arguably could be split.
The middlegame is relatively poorly covered in literature, and that may account for the rather sparse coverage we have there. It is actually a very broad subject, but with a multitude of disparate items, and the subject itself has diffuse edges with openings and endgames.
How difficult has it been to acquire images for chess articles? Aside from photography, what options do editors have when trying to illustrate chess articles?
I think we face many of the same challenges in acquiring images as many other editors face. For many grandmasters we do have portraits though. When reading about chess though, most people are interested in the moves and the positions rather than the faces of the players. The {{Chess diagram}} template for illustrating positions is very heavily used in our articles and very important for illustrating the articles.
How would you describe the sense of community at WikiProject Chess? Do editors tend to work in concerted efforts or alone in their own niche?
The contributors at the WikiProject have generally been friendly and helpful, willing to discuss things. We do sometimes discuss how to coordinate articles and sometimes concerns over notability are aired there before an AFD nomination is made. I am generally an individual editor on articles, but I know others have worked together for improvements. We are individual editors, and sometimes there are disagreements between us, but in the vast majority of cases those differences are settled in an atmosphere of mutual respect.
At times we have faced editors who decide to make war with the members of the Chess WikiProject. One of them became a side issue (the non-chess related DreamHost article was the main issue) in an arbitration case in 2009, in that case ending with a one-year ban of the editor in question for rather extreme personal attacks and other misconduct. Those were unpleasant cases, but I am glad that the members of the WikiProject supported each other through it.
What are WikiProject Chess's most urgent needs? How can a new contributor help today?
The most essential topics have articles now, but there are many articles that are stubs and in need of improvement. As mentioned above, and as seen on the Articles to Create section of WP:CHESS we are missing articles on a number of notable players. Any new contributor who wants to make one or more of these will be a most welcome addition to the project.
Anything else you'd like to add?
I must admit that I am only semi-active with chess articles right now and perhaps not all that up to date with things. Chess is probably the most studied game ever, and there is a wealth of literature for those who want to research it.
Next week, we'll take the northern way through Europe's fjords in search of a great place to live. Until then, search for Viking loot in the archive.
A change to the policy of resysopping former administrators is under discussion. This discussion hopes to clear up cases of when to resysop admins and when they should go through the RFA process again.
Should a section be added to the article titles guideline to recognize how titles are styled currently to avoid arguments between the relationship of WP:MOS and WP:TITLE?
After being successfully proposed, there are still unanswered questions regarding "Today's article for improvement". Questions still needing answers include how many are displayed per day, how the articles are chosen and what the edit notice on the article page will look like.
To many Wikimedians, the Khan Academy would seem like a close cousin: the academy is a non-profit educational website and a development of the massive open online course concept that has delivered over 227 million lessons in 22 different languages. Its mission is to give "a free, world-class education to anyone, anywhere." This complements Wikipedia's stated goal to "imagine a world in which every single person on the planet is given free access to the sum of all human knowledge", then go and create that world.
It should come as no surprise, then, that the highly successful GLAM-Wiki (galleries, libraries, archives, museums) initiative has partnered with the Khan Academy's Smarthistory project to further both its and Wikipedia's goals.
Smarthistory started as a separate website that aimed to "emphasize the experience of looking at art by using unscripted conversations recorded in front of the work of art whenever possible, by incorporating numerous images and video, and by curating links to high-quality resources on the web." They joined the Khan Academy in October 2011 as a natural extension of their mission.
Collaboration between Smarthistory and Wikipedia was stimulated through the most unlikely forum, a Twitter conversation. According to Beth Harris, one of Smarthistory's co-founders, it started when she tweeted about the general public's tendency to almost exclusively go to Wikipedia instead of museums, if museums do not put their content online. Liam Wyatt, who is a pioneer in the GLAM-Wiki initiative, told the Signpost that he replied saying that the two could work together, even if Smarthistory's non-commercial Creative Commons license could not be changed to suit Wikipedia's more permissive CC-by-SA. Through subsequent emails, the idea of a limited collaboration developed.
Smarthistory shared with Liam a spreadsheet that juxtaposed Smarthistory's videos against their relevant Wikipedia articles. When this was posted on the cultural-partnerships-l mailing list, Peter Weis jumped in to wikify the spreadsheet and put it on-wiki, thus beginning the public partnership.
The Signpost asked Smarthistory's co-founders (Beth Harris and Steven Zucker) and Smallbones, who also played a major role in forming the project, what the goals of the Smarthistory–Wikipedia collaboration are, and where the results of this might be applied in other areas. From the Smarthistory side, Harris and Zucker are looking for a complement to the "critical, interpretive method that is central to our work"; they point out that Wikipedia's style and content meet that perfectly, fulfilling their goal. They said they are far from the only ones providing these sorts of open educational resources, and that there are many GLAMs out there with excellent text and video content that are ripe for collaboration. The problem, of course, is that not all GLAMs see the value in distributing their content around the Internet, though Harris and Zucker are confident that they will come around.
Smallbones, replying from the Wikipedian perspective, said that the project was about improving Wikipedia by using the reliable content uploaded by Smarthistory, and he hopes the project will move beyond simple {{external media}} links and will use Smarthistory as a reference within articles. To Smallbones, this is an especially important area in which to collaborate:
“
Reliability and expert opinion are especially important in art-related articles, where opinion and interpretation are always needed, but where ["no original research"] prevents an "ordinary editor" from offering his own interpretation. For example, while I may know that Grant Wood's American Gothic is a humorous, even mocking, painting, it can sometimes be amazingly difficult to go through the available academic literature (generally written for other academics) for a reference and find something so obvious to be clearly stated. Smarthistory gives us expert opinion with video that starts from the basics and moves through context, history, and interesting detail, all in 5 or 10 minutes.
”
There are pitfalls, though. Both Liam Wyatt and Smallbones told the Signpost that the challenge is to avoid the appearance of a conflict of interest. Wyatt said he made clear that there was "no 'deal' or promise to link" to any Smarthistory videos, and Smallbones said that even the potential of being seen as "spamming" the videos is something to watch. He believes that the way around this is to encourage participants to consider including a Smarthistory video as a citation in place of a [citation needed] tag, or as the referenced cornerstone of a new article.
Looking forward, Smallbones said that there is much relatively easy work left to do, and "anybody can take the basic concept of using Smarthistory resources and run with it as far as they want to go." Harris and Zucker expressed similar feelings, as they look for the collaboration to grow as they continue to produce new videos and essays each week. As this content is being translated into many different languages, they are hopeful that other Wikipedias will join the Smarthistory–Wikipedia collaboration.
In brief
Wikimania scholarships: Applications for scholarships to Wikimania 2013 in Hong Kong are now being accepted. Both full and partial scholarships are available—covering airfare, lodging, and registration; and up to half of the estimated airfare, respectively. Applicants will be rated on their Wikimedia activity (both on- and off-wiki), their open-source activity more broadly, their interest in both Wikimania and the Wikimedia movement, and their grasp of English. Applications will be accepted until 23:59 UTC on 22 February.
Chapters association: The job of Secretary-General of the chapters association is now open for applications. The Secretary-General will be in charge of running the operational side of the association. In an unrelated development, the proposed name of the planned association ("Wikimedia Chapters Association") has run into problems, since the use of the name Wikimedia was inconsistent with the Wikimedia Foundation's trademark policy.
IdeaLab: The Wikimedia Foundation has opened up its IdeaLab, which in the words of Individual Engagement Grants Siko Bouterse, "is an incubator on meta for people to collaboratively build grant proposals, and get help and input from the community on ideas for new projects that could be developed into grant proposals." The community feedback aspect is strongly emphasized, as the foundation plans to take it into account in every proposal.
Wikipedia Zero: The Wikimedia Foundation's Wikipedia Zero initiative has expanded again with a partnership with VimpelCom. The agreement means that at least 100 million new mobile users will be able to access Wikipedia for free.
Wikimedian-in-Residence opportunity: The British Natural History Museum and Science Museum, in company with Wikimedia UK, have jointly announced an opening for a Wikimedian-in-Residence.
English Wikipedia
New administrator: The Signpost welcomes the newest administrator, Lord Roem, who passed with 138 in support to 13 opposed. One request for adminship remains open, with 80% in support as of publishing time.
Adminship reform: The newest request for comment (RfC) on the request for adminship process is in full swing, with the first round open for votes.
Article feedback: The RfC on the article feedback tool is still open, with the majority view being (as of publishing time) that the tool should be removed from the English Wikipedia.
Popular pages: The newest Top 25 report, a curated list of the most popular articles on the English Wikipedia, is available. In addition, a statistical analysis of the data behind the most popular pages will come out in the Signpost next week.
This week, the Signpost's featured content section continues its recap of 2012 by looking at featured lists. We interviewed FLC directors Giants2008 and The Rambling Man as well as active reviewer and writer PresN.
Giants2008
Many high-quality lists came through the featured list process in 2012, and I had the opportunity to review many of them. My personal favorite among 2012's promotions is Boden Professor of Sanskrit, a unique list on a University of Oxford position. The list is not one of the longest to be promoted last year, but the research behind it is impeccable and the backstory behind the creation of the position is interesting even to those who know nothing about the Sanskrit language, like myself. In addition, the work on this list led directly to the creation of a spin-off article that achieved featured article status. Boden Professor of Sanskrit shows that featured lists can be every bit as well-researched as longer articles, and that lists can be more than just a bunch of data-filled tables.
The Rambling Man
We have a very diverse set of subjects nominated at WP:FLC, but in 2012, within the nearly 240 promotions, I found a pair of lists that were just a little bit "more" diverse than our usual fare. Firstly, the list of chronometers on HMS Beagle, nominated by Spinningspark in May was a really engaging piece of work combining nice prose, with useful tables, great images and gave prominence to a really niche subject matter. My second choice, nominated by a group of enthusiastic editors led by Serendipodous, is the timeline of the far future, successful in August on its third attempt at FLC, just going to show that you should "try, try and try again". Another really unusual but very welcome subject for our featured lists.
PresN
It was with great difficulty that I tried to narrow all of the Featured List promotions of the year down to a bare few, and in the end rather than just one I came up with four that I found particularly interesting in their subject matter and impressive in their construction. These are: List of Olympic medalists in art competitions, for capturing a realm of competition I think most people do not know existed; List of chronometers on HMS Beagle, for a fascinating deep dive into a niche area that was of great importance at the time; List of battlecruisers, as one of the capstones of the incredibly massive and long-running Operation Majestic Titan project; and my personal favorite for the year, Timeline of the far future, for the sheer scope of the far-flung events it covers.
Featured articles
Four featured articles were promoted this week:
Neville Cardus (nom) by Brianboulton and Tim riley. Cardus (1888–1975) was an English writer and critic. Born to a poor family, the self-educated man became cricket correspondent of The Manchester Guardian in 1919; he would stay with that newspaper, as a cricket and music critic, until his death. He is credited with revolutionising cricket writing with his vivid description and criticism, although he considered music criticism his prime vocation.
Laevistrombus canarium (nom) by Daniel Cavallari. L. canarium, commonly known as the dog conch, is a commercially important, edible sea snail native to the Indian and Pacific Oceans. First described in 1758, the snail lives on muddy and sandy bottoms, grazing on algae and detritus, and is preyed on by larger snails and vertebrates. Its heavy shell is valued as an ornament.
Lady Saigō (nom) by Boneyard90. Lady Saigō (1552–1589) was consort and confidante to the Japanese shogun Tokugawa Ieyasu. Born to a family which was vassal to the Tokugawa clan, she began a relationship with Ieyasu after being widowed. Lady Saigō influenced Ieyasu's policies and bore the shogun two children, one of whom went on to be shogun himself.
Sawtooth National Forest (nom) by Fredlyfish4. Sawtooth National Forest is a federally protected area in the US states of Idaho and Utah covering 2,102,461 acres (850,836 ha). Named for the Sawtooth Mountains, it was established in 1905 and contains several mountain ranges, sagebrush steppe, spruce-fir forests, alpine tundra, and over 1,100 lakes and 3,500 miles (5,600 km) of rivers and streams. The area is now a major tourist site, with more than a million visitors annually.
Featured lists
Six featured lists were promoted this week:
List of hill forts and ancient settlements in Somerset (nom) by Rodw. The English county of Somerset is home to numerous hill forts and ancient settlements, some dating back over three thousand years. Most are scheduled monuments, affording them legal protection.
Air discography (nom) by Holiday56. The French duo Air have released twelve albums, twenty-two singles and sixteen music videos since their debut in 1995. Their biggest hit was Talkie Walkie in 2004, which topped out at number 3 on the French charts.
Theodore Sturgeon Award (nom) by PresN. The Theodore Sturgeon Memorial Award is presented annually to the author of the best short science fiction story published in English in the preceding calendar year. Established in 1987, the award has had over 150 nominees.
List of literary works published in Asia Raja (nom) by Crisco 1492. The Japanese-run newspaper Asia Raja, based in what is now Indonesia, published sixty-nine poems, sixty short stories, and three serials during its three and a half years of operation.
List of Prime Ministers of Pakistan (nom) by Sahara4u. Pakistan has seen seventeen prime ministers since the position was established in 1947; an additional five people have held the position as caretakers. None have served longer than six years.
Nineteen featured pictures were promoted this week:
Adansonia (nom; related article), by Muhammad Mahdi Karim. Adansonia is a genus of trees, commonly known as the baobabs, which contains eight species; most of these are native to Madagascar. They can grow quite large.
Vipera xanthina (nom; related article), created by Benny Trapp and nominated by Tomer T. V. xanthina is a venomous viper species found in northeastern Greece and Turkey. It can grow to over 1 metre (3 ft 3 in) long and prefers rocky areas with heavy vegetation.
David Dixon Porter (nom; related article), created by Mathew Brady, restored and nominated by Adam Cuerden. Porter (1813–1891) was an admiral in the US Navy who entered the Navy at the age of ten. During the Civil War he saw significant service, later becoming Navy Supervisor.
Poster for Don Quichotte (nom; related article), created by Georges Rochegrosse, restored and nominated by Adam Cuerden. Don Quichotte is an opera in five acts by Jules Massenet to a French libretto by Henri Caïn. It was based on the novel Don Quixote by Miguel de Cervantes.
Taj Mahal Mosque (nom; related article), by Muhammad Mahdi Karim. The Taj Mahal Mosque is a mosque at the Taj Mahal complex in India. It was completed in 1643 and is balanced by another building, or jawab, to its east.
Galerie des Batailles (nom; related article), created by -donald- and nominated by Ceranthor. The Galerie des Batailles is a long gallery on the first floor of the aile du midi of Versailles, in France. It is home to more than a hundred busts and paintings.
The Splatters (nom; related article), created by SpikySnail Games and nominated by Sven Manguard. The Splatters is a physics-based puzzle video game for the Xbox Live Arcade which was developed by SpikySnail Games and released in April 2012.
Ivory soap ad (nom; related article), created by Strobridge Lith. Co., restored and nominated by Adam Cuerden. Ivory is a brand of soap developed by Procter & Gamble in the late 1870s. Known for floating in the bath, the slogan "It floats!" gained currency beginning in 1891.
Painted Stork (nom; related article), by JJ Harrison. The Painted Stork (Mycteria leucocephala) is a large wading bird in the stork family found in parts of Asia. A wader which rarely migrates, it feeds on small fish it finds by touch.
Spotted Redshank (nom; related article), by JJ Harrison. The Spotted Redshank (Tringa erythropus) is a migratory wader which was first described in 1764. It feeds on small invertebrates.
Streaked Spiderhunter (nom; related article), by JJ Harrison. The Streaked Spiderhunter (Arachnothera magna) is a species of bird in the Nectariniidae family which is found in South and Southeast Asia.
Machu Picchu (nom; related article), created by S23678 and nominated by Hahc21. Machu Picchu is a pre-Columbian 15th-century Inca site located in the Peruvian Andes. Although isolated, since its rediscovery in 1911 it has become a popular tourist destination.
Programme for Woman Suffrage Parade of 1913 (nom; related article), created by Benjamin Moran Dale, restored and nominated by Adam Cuerden. The Woman Suffrage Parade of 1913 was a march in Washington, D.C., US, in which over 5,000 women protested against the "political organization of society, from which women are excluded".
The case was filed in an attempt to remedy what was perceived as Doncram's continued unconstructive editing after the failure of previous community discussions and long-term blocks. (cf.SarekOfVulcan's statement)
In his response, Doncram stated that it was a combination of past encounters with other users, a host of ANI and AFD submissions, and perceived uncivil and bully-like behaviour directed at him that contributed to what he perceived as a "battleground atmosphere". He believes that the only way to remedy these issues is by addressing them through an arbitration case.
In light of the evidence present, among the remedies proposed in the workshop are those calling for a topic-ban or outright ban for Doncram, an interaction ban between Doncram and Orlady and SarekOfVulcan, and SarekOfVulcan's desysopping for "gross edit-warring". Workshop submissions and the posting of proposed decisions have been postponed until February 11. Evidence submissions have closed to all uninvolved parties.
As reported in last week's "Technology Report", the WMF's data centre in Ashburn, Virginia ("eqiad") took over responsibility for almost all of the remaining functions that had previously been handled by their old facility in Tampa, Florida ("pmtpa") on 22 January. The Signpost reported then that few problems had arisen since handover. Unfortunately that was not to remain the case, with reports of caching problems (which typically only affect anonymous users) starting to come in.
The main bug driving anonymous users' difficulties, bug #44391 ("old revisions of pages are shown when not logged in and also revision history is outdated"), was finally declared fixed at around 05:00 UTC on 28 January, although only time will tell if further fixes will be needed. After the migration, other miscellaneous problems with the cache for images and other uploads (both originals and thumbnails) appear to worsen and new ones emerge, mixed up with them. WMF Director of Platform Engineering Rob Lanphier shared an update on the current situation.
The data centre in Tampa will continue to be maintained as a "hot failover", with servers in standby mode, ready to take over should the primary site experience an outage. Additionally, the Signpost understands that the Tampa data centre will continue to be used for image scaling in the short term, before that too is migrated to Ashburn.
MediaWiki and Wikimedia developers prepare for conference
At least a dozen volunteer and staff developers and technically-inclined Wikimedians are making their way to European conference FOSDEM this weekend, records show. The Belgian-led conference brings together open-source developers and advocates from around the world.
Right after that, the WMF Language engineering team will be flying to India for a two-week marathon of MediaWiki development and internationalization outreach, including attendance at the 2013 GNUnify conference. WMF developers will also be staging their own workshops at the Quark '13 conference on February 1 and 2 and at the Pune LanguageSummit on February 12 and 13, aiming to better take advantage of the rapidly growing Indian software development scene, which is already one of the largest in the world.
In brief
Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for several weeks.
SVN set to read-only: Most of the WMF-operated Subversion (SVN) was set to read-only this week, marking the penultimate step in the switchover to Git initiated in March last year. Internationalisation support for the extensions which used the repository had already been discontinued; virtually all extension developers have now switched over to Git or found alternative hosting, though a number of unmaintained extensions, none of which are deployed on Wikimedia wikis, are yet to be transferred. Some areas targeting non-extension-related projects such as pywikipedia remain writeable and there are no plans to delete any files for the foreseeable future (wikitech-l, server admin log).
Re-enabling of disabled SpecialPages trialled: As of time of writing, tests are being run to evaluate the feasibility of the various solutions for bug #15434 ("Periodical run of currently disabled special pages"). Particularly observant readers may therefore come across occasional updates to reports, the data for which was previously between 4 and 9 years out of date.
Job queue still imperfect: After weeks of wild variation, the job queue (which handles the processing of low priority tasks – "jobs" – such as updating category membership lists) grew once this week, up to a few million jobs for all wikis (rather than the usual few thousand), but may be getting better. Unfortunately no data can be shown because access to the site tracking portal ganglia.wikimedia.org is restricted for security reasons.