Wikipedia:Wikipedia Signpost/Single/2016-03-02

The Signpost
Single-page Edition
WP:POST/1
2 March 2016

Traffic report
Brawling
 

2016-03-02

Tretikov resigns, WMF in transition

The February WMF Metrics and Activities Meeting

Tretikov's resignation comes after months of public controversy, including the removal of Board member James Heilman (Doc James), the resignation of newly appointed Board member Arnnon Geshuri following a community outcry, and revelations about the mysterious Knowledge Engine project. It follows an even longer period of internal turmoil that has prompted a series of employee departures—events that have mostly remained out of the public eye until recently.

Among WMF staffers, the news of Tretikov's departure was greeted with a sense of relief rather than glee, at least publicly. The February metrics and activities meeting, held only 15 minutes after Tretikov's announcement, was almost jubilant—not about her departure, but as though the staff felt they could celebrate their work and accomplishments, especially those concerning Wikipedia's 15th anniversary, without the pall of recent months cast over them. Those events were not entirely absent, however. A photograph of Siko Bouterse, a widely respected WMF staffer whose departure was one of the flashpoints for other employees, received a standing ovation.

Still, even after the ED's announcement, employee exits from the WMF continue. On March 4, the WMF announced that Boryana Dineva, VP of human resources who went on leave on February 9, will depart.

Jimmy Wales announced he would be visiting San Francisco from February 27 to March 2 to personally meet with WMF staffers. Wales also quietly filled Tretikov's place on the schedule for an already planned March 13 event with Board member Guy Kawasaki at SXSW Interactive.

Speculation remains about who will work with WMF employees as interim executive director now that Wales has left San Francisco. Due to the exodus of employees, few high-level staffers remain to assume that role, with the most likely possibilities being chief communications officer Katherine Maher and general counsel Geoff Brigham. Finding an external candidate in the longer term may be difficult given the negative press coverage following the resignations of Geshuri and Tretikov, on top of the need to find someone with unusual combination of skills to lead an extremely complex organization. In a move that Andrew Lih (Fuzheado) describes as "rather astonishing", in a surprise announcement, Trustee Alice Wiegand posted to the Wikimedia mailing list that:

Opinions on the wisdom of this approach have been divided; critics have condemned it as an abdication of responsibility by the Board, while others have welcomed it as an effort by the Board to be more responsive to staff concerns and input.

Also unresolved is the composition of the Board itself. On February 27, Heilman announced his willingness to resume his seat on the Board and his intention to run again in the next community election. Wales, who described Heilman's account of Board conflicts as "utter fucking bullshit" in January, responded by writing that should Heilman win another election "then I will support his joining the board". Wales reiterated his disagreement with Heilman about how they perceived the circumstances of the dismissal, but the facts leading up to it remain unknown.



Reader comments

2016-03-02

This week's featured content

Diagram showing uranium isotope separation in the calutron.

This Signpost "Featured content" report covers material promoted from 21 to 27 February.
Text may be adapted from the respective articles and lists; see their page histories for attribution.
Persoonia terminalis at the Australian National Botanic Gardens
Arshad Warsi at the Star Parivaar Awards

Four featured articles were promoted this week.

  • Calutron (nominated by Hawkeye7) is a mass spectrometer originally designed and used for separating the isotopes of uranium. It was developed by Ernest O. Lawrence during the Manhattan Project and was based on his earlier invention, the cyclotron. They were used in the industrial-scale Y-12 uranium enrichment plant at the Clinton Engineer Works in Oak Ridge, Tennessee. The uranium-235 produced there was used in the Little Boy atomic bomb that was detonated over Hiroshima on 6 August 1945.
  • The House of Plantagenet (nominated by Norfolkbigfish) was a royal house which originated from the lands of Anjou in France. The family held the English throne from 1154, with the accession of Henry II, until 1485, when Richard III died. The rivalry between the House of Plantagenet's two cadet branches of York and Lancaster brought about the Wars of the Roses, a decades-long fight for the English succession, culminating in the Battle of Bosworth Field in 1485, when the reign of the Plantagenets ended.
  • Jumping Flash! (nominated by Jaguar) is a platform video game co-developed by Exact and Ultra and published by Sony Computer Entertainment. The first installment in the eponymous series, it was first released for the PlayStation on 28 April 1995 in Japan. Presented in a first-person perspective, the game follows a robotic rabbit named "Robbit" as he searches for missing jet pods that have been scattered by the game's astrophysicist antagonist character Baron Aloha. Robbit must explore each section of Crater Planet to retrieve all of the jet pods, stop Aloha and save the world from being destroyed. The game has been described as an ancestor of and as well as an early showcase for 3D graphics in console gaming, and was generally well-received by critics.
  • Persoonia terminalis (nominated by Casliber and Checkingfax) is a rare shrub belonging to the family Proteaceae, and native to northern New South Wales and southern Queensland. It grows to 1.5 metres (5 ft) with an upright or spreading habit and narrow short leaves up to 1 centimetre (0.4 in) in length. The yellow flowers mainly appear in December and January and are followed by purple-striped green drupes. The fruit of persoonias are edible to and dispersed by wild vertebrates.

One featured list was promoted this week.

  • Arshad Warsi (born 1968) is an Indian film actor, choreographer and a dancer. This list (nominated by Skr15081997) presents all his film credits, awards and nominations. He began his career as an assistant director, before dancing in the music video Aag Se Khelenge and choreographing Roop Ki Rani Choron Ka Raja. He made his acting debut in Tere Mere Sapne, and has played in 49 more films since than. He's currently filming two films and The Legend of Michael Mishra is in post-production. During his career he was nominated for 25 awards, and won 11 from them.

Four featured pictures were promoted this week.



Reader comments

2016-03-02

Brawling

One night in June 1852, while the Whig National Convention was going awry, American politician Alvan E. Bovay and newspaperman Horace Greeley dined at Lovejoy's Hotel in New York City, where Bovay first suggested the founding of what became the United States Republican Party. In 2016, with the improbable rise to frontrunner status of Donald Trump (#1 once again this week), its a safe bet that secret dinners are now being had by politicians foreseeing the end of that political party.

This week's Top 25 is dominated by political brawling, with Mr. Trump at the top, and physical brawling, with a number of wrestling-related entries down the chart starting with Fastlane (2016) at #7. Deadpool (film) (#2) rides high for another week as the most popular pop-culture entry.

For the full top-25 list, see WP:TOP25. See this section for an explanation of any exclusions. For a list of the most edited articles of the week, see here.

For the week of February 21 to 27, 2016, the ten most popular articles on Wikipedia, as determined from the report of the most viewed pages, were:

Rank Article Class Views Image Notes
1 Donald Trump B-Class 1,938,436
Donald Trump is likely to be the Republican nominee in this year's United States presidential election, barring something crazy happening. He won the South Carolina primary on February 20, smashed his opponents in the February 23 Nevada primary, and won in 7 of 11 states on March 1's "Super Tuesday" primaries.
2 Deadpool (film) Start-class 1,340,113
Down from #1 and 2.8 million views last week. The Marvel Comics antihero film starring Ryan Reynolds (pictured) was released on February 12 to a stellar reception. Regarded as a risk by its makers 20th Century Fox, the film has earned over $600 million as of February 29.
3 O. J. Simpson B-Class 980,872
As predicted by the co-author of this report (and not wished for by yours truly), the former football player, Leslie Nielsen costar and alleged murderer has become a fixture of this list, thanks to the first season of American Crime Story, the true-crime spinoff of American Horror Story, which focuses on his controversial trial.
4 Neerja Bhanot C-class 947,988
On September 5, 1986, just two days before her 23rd birthday, this Pan Am flight attendant was shot dead by terrorists affiliated with Abu Nidal as she spearheaded an escape from the hijacked Pan Am Flight 73 that ultimately saved over 300 lives. She was posthumously awarded India's highest peacetime bravery award, the Ashoka Chakra. Her life and death became the subject of a Bollywood biopic this week, Neerja, starring Sonam Kapoor (pictured) in the title role. Up from #10 and 800K views last week.
5 Deadpool C-class 797,094
Marvel may have disavowed their X-Men franchise until Fox gives it back to them, but their fourth wall-tickling, chimichanga-chomping, bullet-spraying loony toon obviously remains a potent force, whether they like it or not. Down from #3 and 1.7 million views last week.
6 Robert Kardashian Start-class 678,065 The now-deceased patriarch of the mediavorous Kardashian clan was a close friend of O. J. Simpson (#3) and played a role in the controversial trial that engulfed American pop culture in the 1990s. Of course, the sudden resurgence of interest in the case following the premiere of American Crime Story had led the less scrupulous end of the media to dredge up colorful supposed links between it and the current generation.
7 Fastlane (2016) Start-class 664,591
Wrestler Roman Reigns prevailed in the main event at this professional wrestling event held in Cleveland, Ohio on February 21.
8 Fuller House (TV Series) Start-class 659,077
This sequel series to the the 1987-95 American sitcom Full House debuted on Netflix February 26, 2016. Pictured is actress Candace Cameron Bure, one of the returning cast members. I hope historians of the future realize that the reason ridiculous TV shows could return after 20 years was because of the changing ways we watch TV, with on-demand, niche-driven options of channels like Netflix meeting needs no one could believe existed. To go back to a prior generation of silly TV, The Brady Bunch (1969-1974) milked all it could out of sequels and movies after its original run, but a complete sequel series after twenty years would not have been viable.
9 Melania Trump Start-class 633,777
The Donald's wife has generally kept a lower profile on the campaign trail compared to Trump's children. But the recent growth of TV appearances and brief comments at podiums, feeding the apparently insatiable demand of the news-media for Trump news, puts Melania onto this chart for the first time.
10 Kesha Good article 613,154
Interest in this singer's article started to rise on February 19, and remained popular through this week. This interest arises out of her recent loss of a sexual harassment lawsuit against music producer Dr. Luke (#25) seeking to void her contracts.

Just missing the WP:TOP25: Love (TV series) (#26, new Netflix show); Ted Cruz (#27, if you can't make the Top 25, you're not going to be the nominee); Robert Shapiro (lawyer) (#28, more O.J.); The Walking Dead (season 6) (#29); and List of Bollywood films of 2016 (#30).



Reader comments

2016-03-02

Wikipedia and paid labour; Swedish gender gap; how verifiable is "verifiable"?

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

"Monetary materialities of peer-produced knowledge: the case of Wikipedia and Its Tensions with Paid Labour"

Reviewed by Nicolas Jullien

This article[1] discusses the links between paid efforts and voluntary efforts in the development of Wikipedia, focusing on the question of paid editing. It stresses the fact that Wikipedia is a mixed economy that results partly from paid labor (the technostructure and the people in charge of maintaining it, and those who defend the project in court, i.e. the paid employees of the WF).

The core of the article discusses, based on the debate about Wiki-PR (a company which was paid by firms to "edit" their EN-Wikipedia pages), and the impact it had on Wikipedia policy. It sheds light on the discussion between the Foundation, which expressed a more strict interpretation of the rules, and the contributors, especially from non-English Wikipedias, that took a more "pragmatic" approach. Paid editors provided help to the smaller projects in terms of creation of knowledge. The analysis, which views Wikipedia as a sort of communist organization, is less convincing, as is the fact that the authors did not compare this debate with what happens in FLOSS (free-libre open-source software) or in the non-digital world (the Foundation, or the local community groups), which are other example of the co-existence of voluntary and paid work.

"The Swedish Wikipedia gender gap"

Reviewed by Piotr Konieczny

This masters thesis[2] focuses on the Swedish Wikipedia and its gender gap. It quantifies data and provides information about why Swedish women are not contributing to the project. The author collected data through a questionnaire advertised in December 2014 on the Swedish Wikipedia through a project-wide banner (promotion that an average researcher can only dream about when it comes to English Wikipedia). The paper estimates the Swedish Wikipedia gender gap in the form of the percentage of female editors at between 13% to 19%, based on the self-reported data from Wikipedia account profiles and answers to the questionnaire. More interesting is the analysis of the activity of the accounts: the self-declared male accounts are several times more active then the female accounts, with the authors estimating that only about 5% of the site's content is written by women. Contrary to some prior research (most of which focused on the English Wikipedia), the Swedish Wikipedia's editors and readers do not perceive Wikipedia as a place where sexist comments are significant, though about a third agree that general conflicts between editors do take place. Nonetheless, women are less likely than men to think (1) that Wikipedia is welcoming to beginners; (2) that everyone gets treated equally, regardless of gender; (3) that editing means taking on conflicts. Women are more likely than men to acknowledge the existence of sexist comments. In the author's own words, "women have more concerns about the community being sexist and not welcoming, and do not expect conflict as part of editing to the same degree as men", though the author also notes that statistical tests suggest that "the differences in opinion between gender groups do not differ [sic] greatly".

The author concludes that there is no evidence that the Swedish Wikipedia's readers have any preconceived negative notions about the Wikipedia community (such as "it is sexist") that should inhibit potential women contributors from editing and thus contribute to the gender gap. He states: "Significant differences in perceived competence were found. Women report 'I’m not competent enough' as a strong contributing factor to them not editing more than twice as often as men." The author suggests that because women often perceive, whether correctly or not, that they have lower computer skills than men, and see Wikipedia as a website which requires above-average computer skills, this (rather than an unfriendly, sexist community) may be the most significant factor affecting their lack of contributions. (Cf. related coverage: "Mind the skills gap: the role of Internet know-how and gender in differentiated contributions to Wikipedia'", "Does advertising the gender gap help or hurt Wikipedia?")

Test of 300k citations: how verifiable is "verifiable" in practice?

Reviewed by Tilman Bayer

Four researchers from Dartmouth College have taken the requirement of "verifiability", one of Wikipedia's core content policies, literally. Their preprint[3] examines 295,800 citations from the 5000 most viewed articles on the English Wikipedia (out of a larger set of 23 million citations extracted from a July 2014 dump). These comprised both inline citations (footnotes) and "free citations" (those not related to any particular part of the article). The authors conclude that

"while the quality of references in the overall sample is reasonably high, verifiability varies significantly by article, particularly when emphasizing the use of standard digital identifiers and taking into account the practical availability of referenced sources."

Unsurprisingly, the study did not examine whether the cited documents actually match the information in the articles. Rather, it concerns the question whether the citation enables the reader to carry out this verification. The authors argue that

"simply providing citations and references does not automatically guarantee verifiability. Whether or not provided references and citations are accessible ... is just as important as providing the reference or citation in the first place. There are many ways that an online information source might provide citations and references and still be difficult to verify."

They divide these difficulties into two categories: "technical verifiability" and "practical verifiability."

Technical verifiability is defined as "the extent to which a reference provides supporting information that permits automated technical validation of the existence of the referenced material, based on existing technical standards or conventions," concretely ISBNs, DOIs and Google Books IDs. The study found that:

  • "Out of 37,269 book citations, 29,736 book citations (79.8%) had valid ISBNs, while 3,145 (8.4%) of book citations had invalid ISBNs, and 4,388 book citations (11.8%) contained no ISBN information."
  • "Out of 14,081 Google Books-containing citations, 3,159 (22.4%) contained invalid Google Books IDs."
  • "presence or absence of a Digital Object Identifier (DOI) was noted for any reference tagged as‘journal’, ‘study’, ‘dissertation’, ‘paper’, ‘document’, or similar. Out of 41,244 of these citations, only 5,337 (12.9%) contained neither a DOI or a link to a known open access journal."

Practical verifiability is defined as "the extent to which referenced material is accessible to someone encountering the reference." In particular, the authors point out that information supported by a paywalled journal article "is practically unverifiable to someone without the additional means to access the supporting journal article. Similarly, if an ISBN is present but refers to a book that only has one extant copy in a library thousands of miles away, then the information it supports is practically unverifiable to someone without the additional means to access the supporting book." Apparently the authors found it difficult to translate these notions into criteria that would lend themselves to a large scale quantitative analysis, and settled for two rather narrowly defined but still interesting aspects:

  • "Journal citations linking to ‘arXiv' and 'PubMed Central (PMC)' were taken to be open access, while all others were marked unconfirmed. 5,275 of the journal citations out of 41,244 (12.8%) belonged to this confirmed open access category, while 30,632 (74.3%) contained some digital identifier but were not confirmed to be open."
  • "Out of the 10,922 working Google Books links, most (7,749, or 71.0%) are partially viewable with samples, while 1,359 (12.4%) are fully viewable and 1,814 (16.6%) are not viewable at all."

The preprint also contains a literature overview about information quality on Wikipedia, which does the topic scant justice (e.g. of the only three mentioned systematic studies of article accuracy, one is the well-known but over a decade old Nature study, another is a 2014 article whose methodology and conclusions have been described as very questionable, see also below).

With some caveats, e.g. that the quality of the 5000 most-viewed English Wikipedia articles might differ from the quality of the average article, the authors conclude that "from the perspective of overall quality of references in Wikipedia, these findings might seem encouraging", but are concerned that many citations are not practically verifiable.

Twelve years of Wikipedia research

Reviewed by Tilman Bayer

This short (two-page) paper[4] presents "preliminary results that characterize the research done on and using Wikipedia since 2002". It is based on a dataset of 3582 results of a Scopus search in November 2013 (for the term "Wikipedia" in title, abstract and keywords), largely relying on the abstracts of these publications. 641 of them were discarded as unrelated. Of the remaining 2968, the relevance for Wikipedia was judged as "major" for 2301 and as "minor" for 667.

Examining a dichotomy that is familiar to the editors of this newsletter too (which, for example, usually does not cover papers that merely rely on Wikipedia as a text corpus, even though these are numerous in fields such as computer linguistics), the authors write:

"In terms of topic, there were almost an equal number of items about Wikipedia (1431, 48%) as there were using Wikipedia (1537, 52%)",

defining the latter as employing "Wikipedia either as a source/resource for other research or used Wikipedia to test the feasibility and applicability of tools or methods developed for purposes not directly related to Wikipedia". Those papers only began appearing in 2005, but overtook the "about" category in 2009 and have remained in the majority since." (See also coverage of a presentation at Wikimania 2013 that likewise traced publication numbers over the years – based on Google Scholar instead of Scopus – and dated the first appearance of "Wikipedia as a corpus" research to 2005, too: "Keynote on applicable Wikipedia research")

The researchers classified publications by their methodology, into "social/theoretical" (including "analyses and visualizations of Wikipedia") and "technological" (in the "about" category, this classification was reserved to "tools developed for improving Wikipedia"), and found that:

"the technological approach was considerably more popular (1856 items, 63%) compared to the social approach (1112 items, 37%). ... we see that at first the social aspects were emphasized, but since 2007 papers on technological aspects are much more frequent."

The authors extended their search beyond Scopus to Web of Science and the ACM Digital Library for an examination of how the overall volume of published Wikipedia research has developed over time. The resulting chart indicates that the fast growth of earlier years leveled off, with even some decrease in 2013, the last year examined.

Further criticism of study that had criticized accuracy of medical Wikipedia articles

Reviewed by Tilman Bayer

Three letters to the editor of the Journal of the American Osteopathic Association adds to criticism of an article[supp 1] by Hasty et al. that had appeared in the same journal earlier, and was widely covered in the media with headline phrases such as "90% of [Wikipedia's] medical entries are inaccurate".

Like editors from WikiProject Medicine at the time, the writers of the first letter[5] lament that the paper's authors "have not made their dataset public, so it is impossible to confirm the veracity of their conclusions"; however, "they did share with us a small subset of their dataset on major depressive disorder. We closely examined two statements from Wikipedia that the researchers identified as inaccurate." After outlining that the peer-reviewed literature on these two issues is "rife with debate", and pointing out that some of it supports rather than contradicts the information on Wikipedia, they state that "It seems problematic to conclude that statements made in Wikipedia are wrong based on peer-reviewed literature", also quoting the editors of Nature observing that "peer review per se provides only a minimal assurance of quality". (On another occasion, the lead author had revealed a third Wikipedia statement that according to the study contradicted the peer-reviewed literature and which he described as dangerously wrong; however, it was in agreement with the hypertension guidelines of the UK National Institute for Health and Care Excellence (NICE).[supp 2])

The letter writers highlight the fact that the study relied on "third-year residents with no specific expertise [to] correctly ascertain the accuracy of claims made on Wikipedia" in this way. In a response[6], Hasty et al. acknowledged that the peer-reviewed literature contained diverging viewpoints on the topic, but held that "if Wikipedia articles are considered review articles, then it would be expected that major controversial points would be discussed rather than presented from one perspective."

The second letter[7] criticizes that "Because Hasty et al did not identify a specified number of assertions for each condition and did not measure whether Wikipedia and peer-reviewed literature were correct or not, respectively, their use of the McNemar test to compare Wikipedia vs peer-reviewed medical literature was inappropriate." A third letter also criticized the usage of this statistical test, adding that "I believe that the study here was incorrectly analyzed and inappropriately published through the same peer-review process that Hasty et al are holding to such high esteem. "[8] In their response[6] Hasty et al. defended their method, while acknowledging that "for greater clarity" some tables should have been labeled differently.

With such severe criticism from several independent sources, it is hard not to see this 2014 paper by Hasty et al. as discredited. Unfortunately, it continues to be occasionally cited in the literature (as mentioned in the review of the "verifiability" paper above) and in the media.

Briefly

The attention economy of Wikipedia articles on news topics

Reviewed by Tilman Bayer
Comparison of topic attention (red and gold lines: average and median pageview numbers to neighboring pages, black line: traffic to the page itself) and creation of new pages linked to the topic (vertical black segments) for an expected event (2012 Summer Olympics, top) and an unexpected event (Hurricane Sandy, bottom). In the graphs on the right, "white nodes represent the neighbor articles predating 2012; colored nodes correspond to neighbors created in 2012. The size of the nodes is proportional to their yearly traffic volume; ... New articles tend to be peripheral to these networks."

A paper[9] in Scientific Reports examined how the public attention to a news topic relates to the pageviews of the Wikipedia article about that topic, and the creation dates of related articles. As proxy for the general attention to the topic, the authors use traffic to pages "neighboring" the main article about the topic itself (i.e. linking to and linked from it), including the time before it was created. From the (CC BY licensed) paper:

"Our analysis is focused on the year 2012. We collected the neighbors of 93,491 pages created during that year. ... Which kinds of articles precede or follow demand for information? In Table 1 we list a few articles with the largest positive and negative bursts. Topics that precede demand (ΔV/V > 0) tend to be about current and possibly unexpected events, such as a military operation in the Middle East and the killing of the US ambassador to Libya. These articles are created almost instantaneously with the event, to meet the subsequent demand. Articles that follow demand (ΔV/V < 0) tend to be created in the context of topics that already attract significant attention, such as elections, sport competitions, and anniversaries. For example, the page about Titanic survivor Rhoda Abbott was created in the wake of the 100th anniversary of the sinking."


A Swiss perspective on Wikipedia and academia

Reviewed by Piotr Konieczny

This conference paper[10] states in its abstract an intent to broadly analyze and present all aspects of Wikipedia use in education. Unfortunately, it fails to do so. For the first four and half pages, the paper explains what Wikipedia is, with next to no discussion of the extensive literature on the use of Wikipedia in education or its perceptions in academia. There is a single paragraph of original research, based on the interview of three Swiss Wikipedians; there is little explanation of why those people where interviewed, nor are there any findings beyond description of their brief editing history. The paper ends with some general conclusions. Given the semi-formal style of the paper, this reviewer finds that it resembles an undergraduate student paper of some kind, and it unfortunately adds nothing substantial to the existing literature on Wikipedia, education and academia.

Other recent publications

A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.

  • "The success and failure of quality improvement projects in peer production communities"[11] From the abstract: "Mining data from five quality improvement projects in the English Wikipedia [ Collaboration of the Week (CotW), WikiCup, Wikipedia Education Program (WEP), Wikipedia:Community Portal (CP) and Today's Article for Improvement (TAFI)] we show that certain types of strategies (e.g. creating artefacts from scratch) have better quality outcomes than others (e.g. improving existing artefacts), even if both are done by a similar cohort of participants."
  • "Passing on: reader-sourcing gender diversity in Wikipedia"[12] From the abstract: "We present the Passing On system, that reader-sources the creation and expansion of Wikipedia articles about women [using a database of New York Times obituaries], aiming to support frame changes on women's representation and offer a counter-public for novice Wikipedians." (See also http://passingon.natematias.com/)
From Joseph Priestley's A Chart of Biography (1765), referenced in this paper about biography networks on Wikidata
  • "Quantifying cultural histories via person networks in Wikipedia"[13] (also presented as conference poster at NetSci 2015) From the abstract: "At least since Priestley's 1765 Chart of Biography, large numbers of individual person records have been used to illustrate aggregate patterns of cultural history. Wikidata, the structured database sister of Wikipedia, currently contains about 2.7 million explicit person records, across all language versions of the encyclopedia. ... This situation provides us with the chance to go beyond the illustration of an idiosyncratic subset of individuals, as in the case of Priestly [sic]. ... We construct networks of co-occurring nationalities and occupations, provide insights into their respective community structure, and apply the results to select and color chronologically structured subsets of a large network of individuals, connected by Wikipedia hyperlinks." (See also coverage of earlier related work by the same authors: "The history of art mapped using Wikipedia")
  • "Wikiometrics: a Wikipedia-based ranking system"[14] From the abstract: "We demonstrate an innovative mining methodology, where different elements of Wikipedia – content, structure, editorial actions and reader reviews – are used to rank items in a manner which is by no means inferior to rankings produced by experts or other methods. We test our proposed method by applying it to two real-world ranking problems: top world universities and academic journals." (Cf. coverage of related papers coauthored by these authors: "'Do Famous People Live Longer?' Yes for academics, no for artists and athletes")
  • "Enabling complex Wikipedia queries – technical report"[15] From the abstract: "... we present a database schema used to store Wikipedia so it can be easily used in query-intensive applications. In addition to storing the information in a way that makes it highly accessible, our schema enables users to easily formulate complex queries using information such as the anchor-text of links and their location in the page, the titles and number of redirect pages for each page and the paragraph structure of entity pages." (Coauthored by one of the authors of the above mentioned ranking paper)
  • "Wisdom of the crowd: Wikipedia controversies and coordinating policies"[16] From the abstract: "Focusing on the years 2003–2006 of Wikipedia, this article discusses Wikipedia’s institutionalization process, which involved policy-setting with respect to two factors: the coordination of volunteer editors and external controversies."
  • "Wikipedia and history: a worthwhile partnership in the digital era?"[17] From the abstract: "... this paper examines Wikipedia as a mode of historical expression in the context of a project on the history of the Australian Paralympic Movement. Wikipedia’s key core content policies of verification, no original research, and neutral point of view (NPOV) as well as the collaborative premise that underpins the online encyclopaedia are the focal points of analysis. [...] the history of the Australian Paralympic Movement shows that Wikipedia can be important to history-making in the digital age in at least two ways. Wikipedia provides a mode of historical expression that is complementary to the narratives of traditional books, and the online encyclopaedia generates a community which has produced articles that have enhanced knowledge about the history of disability sport."
  • "Translating the Swedish Wikipedia into Danish"[18] From the abstract: " This paper presents a Swedish-Danish automatic translation system for Wikipedia articles (WikiTrans). Translated articles are indexed for both title and content, and integrated with original Danish articles where they exist. Changed or added articles in the Swedish Wikipedia are monitored and added on a daily basis. The translation approach uses a grammar-based machine translation system with a deep source-language structural analysis." (see also http://wikitrans.net/ )
  • "Drawing questions from Wikidata"[19] From the abstract: "We introduce Wikidata Quiz, an application that accesses the structured data set of knowledge base Wikidata. We construct a graph by querying multiple Wikidata items originating from any chosen topic."
  • "Exploiting Wikipedia for information retrieval tasks"[20] From the abstract: "This tutorial aims to provide a holistic view of Wikipedia's different features – text, links, categories, page views, editing history etc. – and explore the different ways they can be utilized in a machine learning framework. By presenting and contrasting the latest works that utilize Wikipedia in multiple domains, this tutorial aims to increase the awareness among researchers and practitioners in these fields to the benefits of utilizing Wikipedia in their respective domains ..."
  • "Relation between Wikipedia edits and news published"[21] From the abstract: "This research looks at the relation between the number of Wikipedia edits on corporate pages and the number of news published by English newspapers over a specified time span. ... The new insights could help companies generate and keep a good corporate image that helps with sales, customer loyalty or customer acquisition. Through showing that some scandals did affect the corporate Wikipedia pages, it can be stated that the site can act as a source of information for users or news outlets and that companies need to take Wikipedia as a public relations tool into account."
  • "Getting a 'quick fix': first-year college students' use of Wikipedia"[22] From the abstract: "This study found that first-year students are uncertain about the variety of ways to use information sources like Wikipedia, and that a direct and balanced approach to this area from instructors may lead to better outcomes than strict prohibition or silence."

References

  1. ^ Lund, Arwid; Venäläinen, Juhana (17 February 2016). "Monetary materialities of peer-produced knowledge: the case of Wikipedia and its tensions with paid labour". TripleC: Communication, Capitalism & Critique. Open Access Journal for a Global Sustainable Information Society. 14 (1): 78–98. doi:10.31269/triplec.v14i1.694. ISSN 1726-670X.
  2. ^ Helgeson, Björn (2015). "The Swedish Wikipedia gender gap". Stockholm, Sweden: Royal Institute of Technology. {{cite journal}}: Cite journal requires |journal= (help)
  3. ^ Harder, Reed H.; Velasco, Alfredo J.; Evans, Michael S.; Rockmore, Daniel N. (18 September 2015). "Measuring Verifiability in Online Information". arXiv:1509.05631 [cs.SI].
  4. ^ Bar-Ilan, Judit; Noa Aharony (2014). "Twelve years of Wikipedia research". Proceedings of the 2014 ACM Conference on Web Science. WebSci '14. New York, NY, USA: ACM. pp. 243–244. doi:10.1145/2615569.2615643. ISBN 978-1-4503-2622-3. Closed access icon
  5. ^ Leo, Jonathan; Lacasse, Jeffrey (October 2014). "Wikipedia vs Peer-Reviewed Medical Literature for Information About the 10 Most Costly Medical Conditions II". The Journal of the American Osteopathic Association. 114 (10): 761–764. doi:10.7556/jaoa.2014.147. ISSN 0098-6151. PMID 25288708.
  6. ^ a b Hasty, Robert; Garbalosa, Ryan; Suciu, Gabriel (October 2014). "Wikipedia vs Peer-Reviewed Medical Literature for Information About the 10 Most Costly Medical Condition [Response]". The Journal of the American Osteopathic Association. 114 (10): 766–767. doi:10.7556/jaoa.2014.150. ISSN 0098-6151. PMID 25288711.
  7. ^ Chen, George; Xiong, Yi (October 2014). "Wikipedia vs Peer-Reviewed Medical Literature for Information About the 10 Most Costly Medical Conditions III". The Journal of the American Osteopathic Association. 114 (10): 764–765. doi:10.7556/jaoa.2014.148. ISSN 0098-6151. PMID 25288709.
  8. ^ Gurzell, Eric (October 2014). "Wikipedia vs Peer-Reviewed Medical Literature for Information About the 10 Most Costly Medical Conditions IV". The Journal of the American Osteopathic Association. 114 (10): 765–766. doi:10.7556/jaoa.2014.149. ISSN 0098-6151. PMID 25288710.
  9. ^ Ciampaglia, Giovanni Luca; Flammini, Alessandro; Menczer, Filippo (19 May 2015). "The production of information in the attention economy". Scientific Reports. 5: 9452. arXiv:1409.4450. Bibcode:2015NatSR...5E9452C. doi:10.1038/srep09452. ISSN 2045-2322. PMC 4437024. PMID 25989177.
  10. ^ Timo Staub, Thomas Hodel (2015). "Wikipedia Vs. Academia an Investigation into the Role of the Internet in Education, with a Special Focus on Collaborative Editing Tools Such as Wikipedia" (PDF). 11th International Conference eLearning and Software for Education. Vol. 1. pp. 13–19. doi:10.12753/2066-026X-15-001. {{cite book}}: |journal= ignored (help)
  11. ^ Warncke-Wang, Morten; Ayukaev, Vladislav R.; Hecht, Brent; Terveen, Loren G. (2015). "The success and failure of quality improvement projects in peer production communities". Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. CSCW '15. New York, NY, USA: ACM. pp. 743–756. doi:10.1145/2675133.2675241. ISBN 978-1-4503-2922-4. Closed access icon (Author's copy)
  12. ^ Matias, J. Nathan; Diehl, Sophie; Zuckerman, Ethan (2015). "Passing on: reader-sourcing gender diversity in Wikipedia". Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems. CHI EA '15. New York, NY, USA: ACM. pp. 1073–1078. doi:10.1145/2702613.2732907. ISBN 978-1-4503-3146-3. Closed access icon (Author's copy)
  13. ^ Goldfarb, Doron; Merkl, Dieter; Schich, Maximilian (22 June 2015). "Quantifying cultural histories via person networks in Wikipedia". arXiv:1506.06580 [cs.SI].
  14. ^ Katz, Gilad; Rokach, Lior (5 January 2016). "Wikiometrics: a Wikipedia-based ranking system". arXiv:1601.01058 [cs.DL].
  15. ^ Katz, Gilad; Shapira, Bracha (13 August 2015). "Enabling complex Wikipedia queries – technical report". arXiv:1508.03298 [cs.IR].
  16. ^ Yam, Shing-Chung Jonathan (2015). "Wisdom of the crowd: Wikipedia controversies and coordinating policies" (PDF). Journal for the Liberal Arts and Sciences. 20 (1). ISSN 2167-3756.
  17. ^ Phillips, Murray G. (7 October 2015). "Wikipedia and history: a worthwhile partnership in the digital era?". Rethinking History. 20 (4): 523–543. doi:10.1080/13642529.2015.1091566. ISSN 1364-2529. Closed access icon
  18. ^ Bick, Eckhard (2014). "Translating the Swedish Wikipedia into Danish" (PDF). Swedish Language Technology Conference 2014.
  19. ^ Bissig, Fabian (22 October 2015). "Drawing questions from Wikidata" (PDF). Zurich, Switzerland: Distributed Computing Group; Computer Engineering and Networks Laboratory – ETH Zurich. {{cite journal}}: Cite journal requires |journal= (help)
  20. ^ Shapira, Bracha; Ofek, Nir; Makarenkov, Victor (2015). "Exploiting Wikipedia for information retrieval tasks". Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '15. New York, NY, USA: ACM. pp. 1137–1140. doi:10.1145/2766462.2767879. ISBN 978-1-4503-3621-5. Closed access icon
  21. ^ HENKES, D. (2015). "Relation between Wikipedia edits and news published" (info:eu-repo/semantics/bachelorThesis). (student essay)
  22. ^ Garrison, John C. (5 October 2015). "Getting a "quick fix": first-year college students' use of Wikipedia". First Monday. 20 (10). doi:10.5210/fm.v20i10.5401. ISSN 1396-0466.
Supplementary references:
  1. ^ Hasty, Robert T.; Garbalosa, Ryan C.; Barbato, Vincenzo A.; Valdes, Pedro J.; Powers, David W.; Hernandez, Emmanuel; John, Jones S.; Suciu, Gabriel; Qureshi, Farheen; Popa-Radu, Matei; Jose, Sergio San; Drexler, Nathaniel; Patankar, Rohan; Paz, Jose R.; King, Christopher W.; Gerber, Hilary N.; Valladares, Michael G.; Somji, Alyaz A. (1 May 2014). "Wikipedia vs Peer-Reviewed Medical Literature for Information About the 10 Most Costly Medical Conditions". JAOA: Journal of the American Osteopathic Association. 114 (5): 368–373. doi:10.7556/jaoa.2014.035. ISSN 0098-6151. PMID 24778001.
  2. ^ Anwesh Chatterjee, Robin M.T. Cooke, Ian Furst, James Heilman: Is Wikipedia’s medical content really 90% wrong? Cochrane blog, 23 June 2014



Reader comments

2016-03-02

Wikimedia Foundation details requests to alter or remove content in new transparency report


The following content has been republished from the Wikimedia Blog. Any views expressed in this piece are not necessarily shared by the Signpost; responses and critical commentary are invited in the comments. For more information on this partnership, see our content guidelines.


Related articles
Transparency Report

 

In November 2015, the Wikimedia Foundation's legal team received an email seeking control of an article on English Wikipedia about a dance group. The writers said that they were former members of the group, and argued that edits made by other dancers infringed their trademark. We explained that writing an article about a notable topic is not infringement, and suggested that they work with Wikipedia editors if they’d like to improve the article.

Every year, the Foundation receives hundreds of emails and phone calls requesting changes to Wikipedia, Wikimedia Commons and the other Wikimedia projects. A politician may want a friendlier article, or an entertainer may want a more flattering one. Perhaps a business wants to control what is written about its product. In the past six months, the Foundation received 220 such requests—and we didn’t grant a single one, because we believe that our user community should determine the content within the projects.

Transparency and openness are important cornerstones of the Wikimedia movement. One expression of these values is our biannual transparency report, in which we provide information about the requests we receive to remove content from the projects or disclose user data. We published our first transparency report in August 2014; the most recent update covers July–December 2015.

Mention of the monkey selfies in the WMF's first transparency report brought widespread attention to a controversial copyright matter.

The report tracks five data-points:

Content alteration and takedown requests. None of 220 requests to alter or remove content was granted. Seven came from government entities. Compared with other major web properties, we receive few content removal requests, because the Wikimedia community works diligently to comply with project policies concerning accuracy and neutrality, and to address any potential concerns. The Foundation believes that the community should determine what content belongs on the projects, and we push back on alteration and takedown requests we do receive in order to ensure that the Wikimedia projects remain neutral, uncensored platforms for sharing free knowledge.

Copyright takedown requests. Between July and December, 2015, we received 20 Digital Millennium Copyright Act (DMCA) requests, of which nine were granted. We receive very few DMCA notices, because Wikimedia users are careful to ensure copyright compliance. Our legal team evaluates the DMCA notices that we do receive very carefully, to investigate whether or not the content is infringing, and determine whether any legal exceptions (such as fair use) may apply.

Right to be forgotten. Wikimedia received four requests for content removal based upon the “right to be forgotten”. We did not grant any of these requests. Our concerns about the relevant European Court opinion, and its implications for free knowledge, have not lessened since we first expressed them in August 2014.

Requests for user data. Wikimedia is committed to protecting the privacy of our users. In the last six months, we received 25 requests—including informal requests by governments, informal non-government requests, court orders and civil and criminal subpoenas—to disclose nonpublic user data. Only one of these requests resulted in the disclosure of such information. When we receive such a request, we evaluate it very carefully to ensure that it complies with both the law and our stringent standards. Even when a request is valid and enforceable, we often do not have the information sought. We collect little nonpublic data from our users, and only retain it for a short time.

Voluntary disclosure. On extremely rare occasions, the Foundation becomes aware of concerning information via the projects, such as a suicide or bomb threat. In such cases, consistent with our Privacy Policy, we may voluntarily provide information to the proper authorities to resolve the issue and ensure safety. Between July and December 2015, we made 12 such disclosures.

Our newest report features not only updated numbers, but new and interesting stories from the past six months. We answer frequently asked questions about the report itself, and our commitment to transparency. The Foundation invites you read the full report here to learn more about our efforts to protect user privacy and keep the content on the Wikimedia projects accurate, neutral, and in the hands of the community.

Aeryn Palmer is legal counsel and Jim Buatti legal fellow at the Foundation. The transparency report was produced with the help of many individuals, including Michelle Paulson, Geoff Brigham, Prateek Saxena, Moiz Syed, Jacob Rogers, James Alexander, Anisha Mangalick, Jane Pardini, Kevin Jacobsen, and the entire WMF Communications Team.



Reader comments

If articles have been updated, you may need to refresh the single-page edition.