Wikipedia:Wikipedia Signpost/Single/2012-07-30

The Signpost
Single-page Edition
WP:POST/1
30 July 2012

Featured content
One of a kind
 

2012-07-30

Conflict dynamics, collaboration and emotions; digitization vs. copyright; WikiProject field notes; quality of medical articles; role of readers; Best Wiki Paper Award

Modeling social dynamics in a collaborative environment

A draft of a letter, submitted for publication, has been posted on ArXiv.[1] The letter reports research on modeling the process of collaborative editing in Wikipedia and similar open-collaboration writing projects. The work builds on previous research by some of its authors on conflict detection in Wikipedia. The authors explore a simple agent-based model of opinion dynamics, in which editors influence each other either by direct communication or by successively editing a shared medium, such as a Wikipedia page. According to the authors, the model, although highly idealized, exhibits a rich behavior that can reproduce, albeit only qualitatively, some key characteristics of conflicts over real-world Wikipedia pages. The authors show that, for a fixed editorial pool with one "mainstream" and two opposing "extremist" groups, consensus is always reached. However, depending on the values of the model's input parameters, achieving consensus may take an extremely long time, and the consensus does not always conform to the initial mainstream view. In the case of a dynamic group, where new editors replace existing ones, consensus may be achieved through a phase of conflict, depending on the rate of new editors joining the editorial pool and on the degree of controversy over the article's topic.

How Wikipedia articles benefit from the availability of public domain resources

In a copyright panel at this month's Wikimania, Abhishek Nagaraj – a PhD student and economist from the MIT Sloan School of Management – presented early results from an econometric study of copyright law. The study used data from the English Wikipedia's WikiProject Baseball to try to consider how gains from digitization are moderated by the effects of copyright. Previous work on the economics of copyrights have struggled to disentangle the effects of copyright with the effects of increased access that often coincides with content after it has entered the public domain.

The paper takes advantage of the fact that in 2008, Google digitized and published a large number of magazines as part of the Google Books projects. Among other magazines published were 70 years of back-issues of Baseball Digest, a magazine that publishes baseball stories, statistics, and photographs. Measuring the effect of digitization, Nagaraj found that the articles on baseball All-Stars from between 1944 and 1984 saw large increases in size (5,200) around the period that the digital Google Books version of Baseball Digest became available. However, because of the law governing copyright expiration, all the issues of Baseball Digest published before 1964 were in the public domain, while issues published after were not. Using the econometric difference in differences technique, Nagaraj compared the different effects of digitization for (1) players who began their professional baseball career after 1964 and as a result had no new digitized public-domain material and (2) players who had played before and were thus more likely to have digitized material about them enter the public domain.

In terms of the effect of copyright, Nagaraj found no effect on the length of Wikipedia articles on public domain status but found a strong effect for images. Wikipedia writers could, presumably, simply rewrite copyrighted material or may not have found the Baseball Digest form appropriate for the encyclopedia. However, Nagaraj found that the availability of public domain material in Baseball Digest led to a strong increase in the number of images. Before Google Books published the material, the pre-64 group had an average of 0.183 pictures on their articles and the post 64 group had about 0.158 pictures. In the period after digitization, both groups increased but the older group increased more, to 1.15 pictures per article as opposed to 0.667 images for the more recent players whose Baseball Digest material was still under copyright. Nagaraj also found that those players with public domain material have more traffic to their articles. The essay controls for a large number of variables related to players, their performance and talent, and their potential popularity, as well as for trends in Wikipedia editing.

The presentation slides are available on the Wikimania conference website[2] and a nice journalistic write-up was published by The Atlantic.

Annotating field notes via Wikisource

Extraction of location, date and taxon data from Field Notes of Junius Henderson on Wikisource
User:Aubrey's diagram of a future Wikisource, which combines text with additional layers of transcription, hypertext, annotations and comments.

Field notes can be a valuable source of information about meteorological, geological and ecological aspects of the past, and making them accessible by way of Wikisource-based semantic annotation was the focus of a recent study[3] published in ZooKeys as part of a special issue on the digitization of natural history collections. The paper described how the field notes of Junius Henderson from the years 1905–1910 have been transcribed on Wikisource and then semantically annotated, as illustrated in the screenshot. Henderson was an avid collector of molluscs and, while trained as a judge, served as the first curator of the University of Colorado Museum of Natural History. His notebooks are rich in species occurrence records, but also contain occasional gems like this one from September 3, 1905:

The article provides a detailed introduction to the workflows on the English Wikisource in general and to WikiProject Field Notes in particular, which is home to transcriptions of other field notes as well. The data resulting from annotation of the field notes are available in Darwin Core format under a Creative Commons Public Domain Dedication (CC0). This work ties in with discussions that took place at Wikimania about the future of Wikisource, the technical prerequisites and existing tools and initiatives.

Quality of medical information in Wikipedia

The quality of medical information in Wikipedia could be vastly improved, based on the results of a recent study of 24 articles in pediatric otolaryngology[4] (more commonly referred to as "ear, nose, and throat" or ENT). The study compared results on common ENT diagnoses from Wikipedia, eMedicine, and MedlinePlus (the three most popular websites, by their determination) and they found that Wikipedia's articles on ENT were the least accurate and had the most errors of the three and that they were in the middle of the other two in regards to readability.

While one of the most referenced sources in this area, Wikipedia had poor content accuracy (46%) compared to the two other frequent sources. MedlinePlus has comparable (49%) accuracy, but was missing 7 topics. The clear leader in accuracy, eMedicine, suffers from a higher reading level. The study provides specific criteria, in section 2.3, which could be considered for evaluation of existing articles. One limitation of the study is that, while suggesting that Wikipedia "suffers from the lack of understanding that a physician-editor may offer", it does not point to information on how to get involved with Wikipedia. Engagement with the pediatric medicine community would be beneficial, especially since about 25% of parents made decisions about their children's care in part based on online information.

Emotions and dialogue

A forthcoming paper at this year's WikiSym conference investigates the emotions expressed in article and user talk pages.[5] "Administrators tend to be more positive than regular users", and the paper suggests that "as women gain experience in Wikipedia they tend to adopt the emotional tone of administrators", for instance linking to policy at more than twice the rate as males. Due to the likelihood of women to interact with other women, they suggest gender-aware recruiting to address the gender gap.

The authors point out the utility of positive emotion in keeping discussions on track, and suggest that experienced editors should be encouraged to maintain a positive climate. To determine users' gender, they used a crowd-sourced study through Crowdflower. Emotions are determined using the ANEW wordlist which distinguishes the range of emotional variability, based on valence, arousal, and dominance. The paper notes that policy mentions tend to have "a remarkably positive and dominant tone, and with stronger emotional load than in the rest of the discussion'".

Editor collaboration patterns

A paper from the University of Alberta addresses the difficulty of analyzing edit histories and finding conflict in particular.[6] They use terms indicating content-based agreement (e.g. "add", "fix", "spellcheck", "copy", and "move") and disagreement ("uncited", "fact", "is not", "bias", "claim", "revert", and "see talk page"). They define conflicting interactions as those that revert, or delete content, or use more negative terms than positive terms. They find that this is a useful way to identify controversial articles.

Why does the number of Wikipedia readers rise while the number of editors doesn't?

A student paper for a course on "Project in Mining Massive Data Sets" at Stanford University, titled "Wikipedia Mathematical Models and Reversion Prediction"[7] tries to use mathematical models "to explain why the amount of [editors on the English Wikipedia] stops increasing, whereas the amount of viewers keeps increase", and "to predict if an edit will be reverted." The researchers used Elastic MapReduce on Amazon's servers to carry out this research. The paper is a bit confused since the researchers are more interested in models and validation than explaining the phenomena.

The first part of the paper includes two models for examining the relation of visitors to editors in Wikipedia's community. The first model makes the assumption that editors act as predators and articles have the role of prey. However this model did not fit the data. The second model used a linear regression between a number of factors which allow the authors to model the community's statistics over time. The model is then tested using simulation and seems to present accurate results.

In the second part of the paper, three models were used to predict which edits will get reverted. The models were trained using 24 features, classified either as edit, editor or article based. E.g. an article's age; its edit count; number of editors participating in editing; number of articles the editor has edited; change in information compared to previous status. The outcome of the prediction which used three machine learning algorithms achieved about 75% accuracy and another interesting conclusion was that the ability to detect reversion has not changed much over time.

Briefly

  • What was the most influential paper ever about Wikipedia and related topics?: Wikimedia France is currently seeking nominations for its Research Award (which comes with a grant of €2500), which aims "to reward the most influential research paper on Wikimedia projects" published between 2003 and 2011. In the coming years, the scope is to be widened to include free knowledge projects more generally. Submission deadline for paper nominations is August 7. The winner shall be announced in November.
  • Retrieving information missing from Wikipedia articles: A paper presented at the 6th International Conference on Ubiquitous Information Management and Communication presents a technique developed by researchers at Kyoto University to compare Wikipedia articles with matching sources retrieved via search engines and identify, via topic modeling, to what extent the external source includes complementary information not covered in the article.[8] The paper then proposes a method to extract sentences from these sources and rank them to facilitate editorial work. Two case studies are discussed analyzing the Yutaka Taniyama and Influvac articles from the English Wikipedia.
  • Mining Wikipedia for common traits of notable individuals: Researcher Pauline C. Ng presented a paper at ICWSM '12 showcasing the potential of using Wikipedia as a corpus of data to study the common characteristics of "notable individuals".[9] Names and birth locations of a list of 40,250 people born in the United States from 1940–1989 and with a Wikipedia article were compared against census data. The analysis reveals interesting patterns such as the fact that "people with rare names [are] more than 2x likely to appear in Wikipedia" or that "people with nicknames are more likely to be in Wikipedia", but with a significantly more pronounced effect for male than female individuals. The author suggests that mining Wikipedia biographies may help "discover novel characteristics associated with positive life outcomes". The main findings of the paper are summarized in this blog post.
  • 2012 Aurora shooting: Brian Keegan, who has published a series of previous articles on coverage of breaking news topics in Wikipedia (see e.g. our past coverage: "High-tempo contributions: Who edits breaking news articles?"), published a series of analyses and a series of graphs on the first several days of responses and article writing on Wikipedia to cover the 2012 Aurora shootings on English Wikipedia.[10] Several participants responded to Keegan in comments on his blog. Taha Yasseri published a graph of the increase in the number of articles on the shootings in different languages.[11]
  • Detecting featured articles using fuzzy logic: A paper[12] by two Bangkok-based computer scientists constructed a fuzzy logic ruleset to discern the featured articles on the Thai Wikipedia (88 at the time of the study) from non-featured articles (100 in the examined sample). Using 26 rules, from unsurprising ones such as the assumption that an article with few footnotes probably does not have featured status, to more complicated criteria involving the most frequent and second most frequent editor of the article, they achieved 100% recall (i.e. detecting all featured articles) and 86% precision (i.e. of the articles detected as having featured quality, 86% actually had featured article status). This compared favorably to a different detection method (which clustered articles according to their distance in a similarity measure that the authors do not specify), supporting the authors' thesis that fuzzy logic is a better approach to the problem, because "the quality of Wikipedia articles should be graded [by] more than two values (good or not good)". (See also coverage of an earlier paper with similar goal: "Lexical clues" predict article quality)

References

  1. ^ Török, J.; Iñiguez, G.; Yasseri, T.; San Miguel, M.; Kaski, K.; Kertész, J. (2012) "Opinions, Conflicts and Consensus: Modeling Social Dynamics in a Collaborative Environment". ArXiv. Open access icon
  2. ^ Nagaraj, Abhishek. (2012) "The effect of copyright law on the reuse of digital content". Wikimania 2012, July 12–15 2012, George Washington University. Open access icon
  3. ^ Thomer, A.; Vaidya, G.; Guralnick, R.; Bloom, D.; Russell, L. (2012). "From documents to datasets: A MediaWiki-based method of annotating and extracting species observations in century-old field notebooks". ZooKeys (209): 235–53. Bibcode:2012ZooK..209..235T. doi:10.3897/zookeys.209.3247. PMC 3406479. PMID 22859891. Open access icon
  4. ^ Volsky, P. G.; Baldassari, C. M.; Mushti, S.; Derkay, C. S. (2012). "Quality of Internet information in pediatric otolaryngology: A comparison of three most referenced websites". International Journal of Pediatric Otorhinolaryngology. 76 (9): 1312–6. doi:10.1016/j.ijporl.2012.05.026. PMID 22770592. Closed access icon
  5. ^ Laniado, David; Castillo, Carlos; Kaltenbrunner, Andreas; Fuster Morell, Mayo. (submitted) "Emotions and dialogue in a peer-production community: the case of Wikipedia". WikiSym’12, August 27–29, 2012, Linz, Austria. Open access icon
  6. ^ Sepehri-Rad, Hoda; Makazhanov, Aibek; Rafiei, Davood; Barbosa, Denilson. (2012) ""Leveraging Editor Collaboration Patterns in Wikipedia)" (PDF).". Open access icon In Proceedings of the 23rd ACM conference on Hypertext and Social Media, pp. 13–22. doi:10.1145/2309996.2310001 Closed access icon
  7. ^ Jia Ji; Bing Han; Dingyi Li. (2012) ""Wikipedia Mathematical Models and Reversion Prediction" (PDF)." Open access icon
  8. ^ Eklou, D., Asano, Y., & Yoshikawa, M. (2012). How the web can help Wikipedia: a study on information complementation of Wikipedia by the web. Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication – ICUIMC ’12 (p. 1). New York, New York, USA: ACM Press. doi:10.1145/2184751.2184763 Closed access icon
  9. ^ Ng, P. C. (2012). "What Kobe Bryant and Britney Spears Have in Common: Mining Wikipedia for Characteristics of Notable Individuals". Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media. Open access icon
  10. ^ Keegan, Brian. (July 21, 2012) "Aurora shootings."
  11. ^ Yasseri, Taha. (2012) "Number of covering WPs vs. time" [1].
  12. ^ Saengthongpattana, Kanchana; Soonthornphisaj, Nuanwan. (2012) ""Thai Wikipedia Quality Measurement using Fuzzy Logic" (PDF)." 26th Annual Conference of the Japanese Society for Artificial Intelligence, June 12–15, 2012, Yamaguchi, Japan. Open access icon


Reader comments

2012-07-30

Wikimedians and London 2012; WMF budget – staffing, engineering, editor retention effort, the global South; Telegraph's cheap shot at WP and the first Punjabi Wikipedia Workshop

Wikimedians work with Olympic and Paralympic photography restrictions

The license-contested photograph of Usain Bolt by Richard Giles, taken seconds after Bolt's 100 m victory at Beijing Olympics 2008. Giles released the image under a Creative Commons Attribution-ShareAlike license on Flickr so it can be used on Wikipedia.
The 2012 Summer Olympics under way in London are the focus of discussions about how they should be covered on Wikimedia projects in the face of tight restrictions on photography. These restrictions became an issue during the Beijing Olympics in 2008 (Signpost coverage) due to legal threats from the International Olympic Committee (IOC) against Richard Giles. The photographer licensed his Flickr image of Usain Bolt – taken seconds after the Jamaican sprinter broke the world record – under the free Creative Commons licenses so it could be used on Wikipedia. Responding to discussions over volunteer photographer accreditation to the games in 2012 on the Wikimedia mailing list, Richard Symonds of Wikimedia UK said his chapter has pursued the matter of allowing more flexibility for Commons licensing, only to receive a resounding 'no' from all corners, even from the UK government.
2012 German Olympian trap-shooter Sonja Scheibl – photographed by Ralf Roletschek during a press gathering.

However, Wikimedia Germany supported a community effort to produce photos for Wikimedia articles on members of the German Olympic team, by piggybacking on the press event at which the team's clothing for London 2012 was presented to the public. Five volunteers managed to take several hundred pictures of the team and the event.

The summer Paralympics, which will start shortly after the finish of the able-bodied Olympics, is locked into the same restrictions on photography and licensing. However, Wikimedia Australia has been working closely with the Australian Paralympic Committee to enable Wikinews coverage by two Wikimedians, Laura Hale and Hawkeye7, the only Wikimedians to have been granted press accreditation at the 2012 Paralympics. This will give them access to Paralympians and other personnel after they finish their events, to ask questions during press conferences, and to conduct interviews. But Wikimedians have to accept that the Olympics are now among the most highly commercialised events in the world. Laura Hale told the Signpost that "rights holders, the ones that pay big money, get the first chance to interview people. Then we're granted a few minutes for interviews if the athletes are amenable." One small hope is to photograph athletes outside the village, she says, which is allowable without commercial restrictions on licensing.

Hale said this is a great opportunity to improve women's content on both Wikipedia and Wikinews, and coverage of people with disabilities, particularly Asian and African Paralympians. She and Hawkeye7 will be working to take and upload pictures under the non-commercial licenses used by Wikinews – which is in line with the International Paralympic Committee's regulations. Non-commercial licenses are incompatible with the licensing policy of Commons. The Australian Paralympic Committee will upload some of their own images under a Creative Commons license, specifically to make them easier for use on Wikinews; however, these images face the same problems as those that will be taken by Laura Hale and Hawkeye7. While images and video are a problem, there are no such restrictions for audio files, which means interviews can be uploaded to Commons under a compatible license.


WMF annual plan

The Wikimedia Foundation has published its 2012–13 Annual Plan, focusing on technical improvements, editor retention, and structural reforms over the coming year. The movement's total revenue, including almost all chapter funding, is slated to rise by 35%, from $34.2 million to $46.1 million, and global spending to more than $42.1 million, although both figures overstate the real increases, since the recent financial reforms now include all financial categories in these figures. The foundation's own core spending will grow by 15% to $30.2 million in 2012–13.

Due to the new financial structure of the movement, $11.4 million of the volunteer-run Funds Dissemination Committee's (FDC) awards and grants – mainly to go to Wikimedia chapters – are part of the WMF's annual plan for the first time. The foundation plans to request $4.5 million of the FDC's $11.4 million allocation to finance non-core activities, which will include the Wikimedia grant program, a GAC allocation doubled to $600k, global education, and education programs in the Arabic-speaking world, Brazil, and India. The movement's overall revenue is projected to grow by 35%, from $34.2M to $46.1M, while continuing to use the less aggressive annual fundraiser methods deployed in 2011–12 with fewer days and fewer "Jimmy Days". Jimmy Wales's image was displayed in the annual fundraiser banners on 12 of the 46 days in 2011, compared with 36 of 50 days in 2010.

On the downside, the plan acknowledges that the foundation has been unable to significantly increase the diversity of its communities – including female participation, which remains at a strikingly low 9% – or to turn the tide on the slight decline of project participation, down in March 2012 to 85,000 regular users (more than five edits a month) from 89,000 a year earlier. This contrasts with last year's goal to increase participation to 95,000 regular users by June 2012. The new Visual Editor was expected to be ready for deployment by June 2012, a target that has now been put back a year to mid-2013. On the other hand, the readership growth goals – to reach a billion people by 2015 – are on track due to increasing mobile page views of 2,008M in April 2012, up a remarkable 187% from 726M a year earlier. The combined Wikipedias, scheduled to reach 50 million articles by 2015, had 22.3 million entries in March 2012, up from 18.8 million over the past year.

According to the plan, the foundation will "redouble" its work to reverse the decreasing participation trend. The document also recognises and describes other key risks, including that:

  • movement tensions will detract from programmatic work (such as the image filter controversy);
  • there could be negative shifts in the international legal context (such as SOPA/PIPA and recent legislative moves in several countries); and
  • revenue targets are not met (for example, if the perception of a large budgetary increase due to the improved visibility of non-WMF budgets suppresses people's willingness to donate).

To address these challenges and the related content-goals, the foundation will increase its support for efforts in strategic key areas such as the Arabic-speaking world, Brazil, and India; the WMF will promote new models of community self-organization (Signpost coverage). Boosting technical capacity will secure the launch of the Visual Editor and new multimedia tools, and will improve mobile access to Wikimedia sites. The foundation's engineering department will be the main focus for staff recruitment: up to 30 engineering jobs will boost numbers by nearly 50%, in an overall staffing increase of 55 for the foundation, bringing numbers to 174.

Chapter growth in spending (blue) compared with that of the WMF (green) 2009–13)
Total spending allocations for the movement, according to the 2012–13 annual plan. Left: projections 2011–12; right, plan for 2012–13. Chapters (yellow); then clockwise FDC/GAC; WMF management and governance, WMF legal, finance, and admin; WMF fundraising; WMF engineering; WMF other programs; and WMF HR, finance and admin

Daily Telegraph's cheap shot at Wikipedia

The UK Telegraph has just published a story apparently sparked by the site-ban of the chair of the WMUK board by the English Wikipedia's ArbCom last week. Written by technology correspondent Christopher Williams under the title "Chairman of Wikipedia charity banned after pornography row", the article attempts to link Fæ's "punishment" with what it calls "a deep rift among Wikipedia contributors over the mass of explicit material in the online encyclopedia", and with the UK government's proposed new controls "to protect children online ... potentially limiting access to Wikipedia".

However, Williams provides no evidence for connecting the complex issues underlying Fæ's ban with the community's protracted discussion of controversial content; nor does his article – complete with a large photograph of Fæ – back up the implication that Wikipedia's policies and practices concerning such content might be caught up by the government's proposed rules. He wrongly confuses the English Wikipedia's rules for controversial content with those of Commons, writing somewhat boldly that "Wikimedia Commons makes massive volumes of pornography freely available to any Wikipedia visitor."

In response to the announcement of ArbCom's sanctions on Fæ, the board of Wikimedia UK had released a statement on 26 July.

The Board is united in the view that this decision does not affect [Fæ's] role as a Trustee of the charity. His work at Wikimedia UK has always been enthusiastic and diligent. In particular, his knowledge of charity governance, and his ability to bring about consensus at WMUK's board meetings, have been particularly valuable. The Board points out that the editing issues were fully public before, and during, the recent elections to the board, and were openly and publicly discussed. Our membership placed their trust in him by electing him as a Trustee. He was then elected unanimously as Chair of the Board. He continues to have the full support of the Board.

Jon Davies, chief executive of WMUK, responded to Williams' piece at the chapter's blog-site: "The Daily Telegraph has chosen its headline to create maximum impact. The reality is far, far more complex." The blog reprinted the board's statement of support, with a link to the publicly available minutes of the board meeting at which it was endorsed.

Wikimania scholarship reform

On July 25, the WMF launched a discussion of how the award of Wikimania scholarships should be reformed. The volunteer committee that reviews scholarship applications for Wikimania has experienced capacity problems, and its structure will be reviewed.

Among the more than 150 scholarships awarded in 2012 – partly with the support of chapters and other entities – the committee approved 130 from applicants in 57 countries. The cost of the scholarship scheme amounts to several hundred thousand dollars. The committee examined some 1,100 confidential applications, with supporting staff aiming to balance factors such as geography, WMF project, and whether applicants had been awarded scholarships for previous Wikimanias. Cost estimates for foundation Wikimania 2012 scholarships are graphed here, based on estimated flights to and from Washington DC from representative airports and assuming a 300 euro award for partial scholars.

Editors are welcome to participate in the discussion on Meta, which is determining how to improve transparency, efficiency and coordination, and alignment with the movement's strategic priorities and the role of qualification standards. The current design of the process is in the handbook.

Wikipedia Punjabi completes 10 years. Organizes 1st workshop at Ludhiana, Punjab India

Wikipedia organized the first ever Punjabi Wikipedia workshop in Punjab at Ludhiana City on 28th July, 2012. Ludhiana, an industrial city of Punjab, saw a decent turnout of 20 people for this open for all workshop. The workshop started with the basic presentation aimed at spreading awareness about Punjabi Wikipedia, educating users on editing techniques, contributing articles and encouraging users to propagate their native language and share their knowledge with the world.

Many women editors for the First Punjabi Workshop

What was amazing though, is the large number of women attendants. So far, Punjabi Wikipedia, that completes ten years, had only two editors and very few articles. After the workshop, we saw an addition of fifteen new editors of which thirteen are women. We also got four new administrators: Tow, Tari Buttar, Guglani and Surinder Wadhawan. Two of the new sysops, Guglani and Surinder Wadhawan were present at the workshop and addressed the students’ queries and motivated them.

The workshop also got coverage by Punjabi media praising this effort from Wikipedia. This includes the Ajit, Punjab Tribune and Hindustan Times. Let's hope that this workshop will kick-start the series of many more workshops across the state and thus many more editors and many more Punjabi articles.

In brief

  • Nominations open for FDC: Nominations for the seven volunteer FDC positions and the related position of ombudsperson are open at Meta until August 15. At the time of writing, there are nine candidates for FDC positions and one for ombudsperson. Membership, which will be decided by the foundation's Board of Trustees, does not require affiliation with a Wikimedia entity. Editors are welcome to ask questions of candidates at a Q&A page.
  • Computational biology article competition: The International Society for Computational Biology has announced an article competition on the English Wikipedia, aiming at improving articles on computational biology, bioinformatics, and computational systems biology. Project coordination takes place at WikiProject Computational Biology and more than 60 users have already signed up for the contest.
  • Mediation Committee reform: The mediation committee's procedures are under review. A discussion with poll is hosted on the committee's talk page.
  • Spanish Wikinews protests against ACTA: The Spanish Wikinews project protests against the Mexican government's move on July 12 to sign the ACTA-agreement. A banner informing readers has been displayed since July 25. The Spanish Wikipedia is considering the case and possible responses.
  • Queering Wikipedia: On 22 July, a small group of mostly new editors traveled to the Tom of Finland Foundation for the second annual "Queering Wikipedia" edit-a-thon in Los Angeles.
  • Changes to the WMF's Indian Catalyst: On July 30, the foundation announced that its India program team will be moved to the Indian research-oriented NGO CIS, which has supported Wikimedia's mission in India in the past and will receive a WMF grant. Sunil Abraham, the executive director of CIS, told the Signpost that an approved budget and project proposal will be published in early August.
  • New administrators: The Signpost welcomes our new administrators, SarahStierch and Berean Hunter.

    Reader comments

2012-07-30

Summer sports series: WikiProject Horse Racing

WikiProject news
News in brief
Submit your project's news and announcements for next week's WikiProject Report at the Signpost's WikiProject Desk.
The article on thoroughbreds has reached Featured Article status
File:1884 Kentucky Derby race, Louisville, Kentucky.jpg
A close finish in the 1884 Epsom Derby
An early edition of the General Stud Book, the original breed registry of the United Kingdom
The horse Luke McLuke circa 1915
The Kentucky Derby is described as "The Most Exciting Two Minutes in Sports"
A racehorse in Tokyo
Jumping hurdles in a steeplechase

We continue our Summer Sports Series this week with WikiProject Horse Racing. Started in November 2005, the project has grown to include nearly 8,000 articles maintained by 34 active members. There are 10 Featured Articles and 19 Good Articles included in the project's scope. In addition to preparing articles for GA and FA status, the project attempts to create requested articles and locate requested images. We interviewed Redrose64, Montanabw, Tigerboy1966, Ealdgyth, and Cuddy Wifter.

What motivated you to join WikiProject Horse Racing? How do articles about horse racing differ from articles about other sports?

Redrose64: I never formally joined. I got involved because I primarily work on railway articles, and there are a number of locomotives which were named after racehorses. For example, most of these were named after winners of the British Classic Races.
Montanabw: I joined due to my strong interest in horses and active participation in WikiProject Equine. Many of the articles are tagged for both projects.
Tigerboy1966: As a consequence of the Global Economic Downturn, I found myself with an excess of leisure time (i.e. I got fired) and needed a cheap, interesting hobby. What attracted me to the project was that European horses seemed to be under-represented compared to North American ones and I wanted to even things up a little.
Ealdgyth: I've done research on Quarter Horses history and how it intersects with early American Thoroughbred history. It's always been an interest, so joining the racing project was sort of a no-brainer.
Cuddy Wifter: Upon discovering Wikipedia some six years ago, my initial contributions were to write articles on my local suburban areas in Melbourne. After finding my feet in the Wiki environment, I branched out to the area of Thoroughbred racing, which had been my main interest (all be it from a gambling perspective) for the past 40 years. I felt that I had incites and knowledge of the subject which would be of benefit to other people.

The project is home to 10 Featured Articles and 19 Good Articles. Have you contributed to any of these? What are some challenges you've encountered when improving horse racing articles?

Montanabw: I was among the core group of people who brought Thoroughbred to Featured Article status. I also worked on Horse and Shackleford (horse) to bring them to Good Article status. I think there are two main challenges with improving horse-related articles: The first is dealing with assorted POV-pushing, which can be either due to the wide diversity of strongly-held opinion that exists within the horse community, or from external sources, such as the animal rights community. The second challenge is to write so that people unfamiliar with horses and horse racing can understand the topic, yet to use the many distinctive terms of art that convey the proper nuance and are widely understood within the community of horse and horse racing enthusiasts. I liken writing about horses to writing about nautical topics, both have extensive specialized language that is necessary to use for a proper discussion of a topic.
Tigerboy1966: I was the "main" on seven of the GAs and chipped in on two of the others. One of the problems, which Montanabw hints at above, is that while some horses have fans, others have worshippers, which makes NPOV difficult to achieve. The only horses who should be called "legendary" are the likes of Pegasus, Sleipnir etc. There are also some areas where on-line sources are very thin: for European racing it's easier to find material on the 1860's than the 1960's.
Ealdgyth: I've also worked to get Horse up to GA (and hopefully FA sooner or later) as well as Thoroughbred. I've also been the main editor on seven of the other FAs for the racing project and five other GAs. The main challenge is that writing an encyclopedia article on a racehorse is not at all close to how most racing writers would write. Racing books are usually written much more in a sports journalism style, which is not well suited to encyclopedias. It's often difficult to keep the tone of the article correct and help well-meaning "helpers" who want to write them like The Daily Racing Form instead of an encyclopedia article.
Cuddy Wifter: The breeding of racehorses is a multimillion industry and we should be on guard for possible manipulation by contributors with a conflict of interest in promoting certain stallions or breeding lines. I am unaware of any peer reviewed scientific studies on the theory of breeding of Thoroughbreds, and am very sceptical of any pedigree section of an individual horse which purports to infer detailed inherited characteristics – as an example see Shackleford. For the past 200 years the main theory of breeding has been that you breed the best with the best and HOPE for the best.

How much overlap exists between WikiProject Horse Racing and WikiProject Equine? Do the two projects collaborate or share resources? Are there any other projects that share common interests with WikiProject Horse Racing?

Redrose64: There is much overlap; so much so in fact that there is very little (gambling, for example) that WikiProject Horse Racing could cover which would not also be covered by Equine. The converse is not true: Equine covers many areas that are not Horse Racing.
Montanabw: While there is a great deal of overlap on the core articles and some collaboration on topics of mutual interest, WP Horse Racing has two or three times the number of articles tagged for the project than does WPEQ, primarily due to the large number of biographies of both individually named race horses and their humans: jockeys, trainers, owners. There are also a significant number of articles on race tracks and certain famous races and famous farms. In those areas, there is relatively little overlap.
Cuddy Wifter: The recent name change to the project from Thoroughbred racing to the more fully encompassing Horse racing has seen a good number of articles on Harness racing and Quarter horse racing transferred from the WPEQ to this project.
For its entire history Horse racing has been the main sport associated with gambling, but in recent years we have seen a proliferation in betting on other sports. Perhaps there may be interest in a WikiProject Sports Betting.

Are some types of horse racing better covered by Wikipedia than others? Is horse racing in some countries under-represented? What can be done to fill holes in Wikipedia's coverage of horse racing?

Montanabw: I think that the focus is proportionate to the overall population of the USA, UK and Australia, each area has contributed excellent editors to the project. There could be improvement on the coverage of racing in Continental Europe and other non-English-speaking places, and the material on racing in places such as India, Japan and Hong Kong, but I think this is a problem across all areas of en.wiki as far as finding good editors with a background in these nations.
Tigerboy1966: Agree with the above. Also we badly some input on Harness Racing, and jump racing beyond the British Isles.
Montanabw: Agree with Tigerboy; also more coverage of racing with horse breeds other than Thoroughbreds, notably Arabians, American Quarter Horses, and some of the more unusual breeds, such as the Finnhorse and Coldblood trotters.
Ealdgyth: Well, the Quarter Horses are probably better covered than any other breed but the Thoroughbreds - I know I've at least got start articles on the horses in the AQHA Hall of Fame. But there needs to be more work on trotting horses and on non-European racing... unfortunately, it's difficult to find information at times.

How does the project determine notability for horses, jockeys, and owners? What are some good resources editors can turn to when sourcing an article or determining notability?

Montanabw: We use the general wikipedia guidelines, similar to what guides writers about sports teams or individual human athletes. For horses, we view them with criteria for notability similar to humans: Did they win notable events, did they make other significant contributions (in the case of horses, this would be genetically) to the improvement of the sport, etc.
Ealdgyth: Generally, a winner of a major graded stakes race or a member of a racing hall of fame is going to be notable enough just based on GNG, at least with horses. The human side is generally covered by the GNG... I haven't really had much need for specialized guidelines to deal with racing - it's a well covered sport in most countries and thus sources exist. It's often just tracking them down that is the issue.
Cuddy Wifter: The factors determining notability for all articles on Horse racing should be set out in detail, on a separate page, so that new contributors can quickly check that any article they start fits the criteria.

What are the project's most urgent needs? How can a new contributor help today?

Montanabw: Cleanup, referencing, finding sources, particularly for the human BLP articles. I think a new contributor would do particularly well to skim the cleanup tags and help with the articles on the many still-living trainers and jockeys, where WP:BLP applies.
Tigerboy1966: We have over 2,500 stubs, and a lot of start class articles towards the stubby end of the scale. In other words, we have lots of articles that would be improved by the addition of any relevant, appropriate content.
Ealdgyth: Cleanup on the various articles that have "tone" issues is a good place to start. Especially with the newer horses, a lot of the articles tend to be dominated by "fans" and could use third party views. (My own personal contributions are usually on older horses, which is where my personal library is focused).

Anything else you'd like to add?

Montanabw: This is a project with over 7,000 tagged articles. There is always room for more contributors!
Tigerboy1966: The coverage of Horse Racing on Wikpedia has improved immensely in the last couple of years: we have a lot more B and C class articles than we used to. We are heading in the right direction, but we are a small project with a lot to do.
Cuddy Wifter: I note that it is six years since this project was started. Having just perused the archives to remind myself of its history, I think it would be of benefit to old and new contributors if a list of agreed policies and guidelines on Horse racing was extracted and set up.


Next week, we'll conclude the Summer Sports Series with a lesson in self defense. Until then, duke it out in the archive.

Reader comments

2012-07-30

One of a kind

This edition covers content promoted between 22 and 28 July 2012.
Only one living specimen of Ecnomiohyla rabborum, also known as the Rabbs' fringe-limbed treefrog, is known to exist. A new featured picture.
The German battleship Bismarck
The mushroom Marasmius rotula
Yossi Benayoun, one of the Israel international footballers, subject of a new featured list. Benayoun has the highest number of appearances and goals for the team among active players.
Featured picture: the Myrtle Warbler
Featured picture: American singer and actor Frank Sinatra

Eight featured articles were promoted this week:

  • German battleship Bismarck (nom) by Parsecboy. Bismarck, the first of its class, was laid down in 1936 and launched two and a half years later. Completed in 1940, Bismarck conducted only one offensive operation in her eight-month career, destroying HMS Hood. This led to a two-day pursuit which resulted in Bismarck's sinking; the cause of the sinking remains disputed, although the wreck has been examined several times.
  • Marasmius rotula (nom) by Sasata. Marasmius rotula, first described in 1772, is a type of fungus that is widespread throughout the Northern Hemisphere. The mushrooms are characterized by their whitish, thin, and membranous caps and generally grow in groups or clusters on decaying wood. Spore production depends on moisture and can last up to three weeks.
  • Derek Jeter (nom) by Muboshgu. Jeter (b. 1974) is an American baseball player who has won numerous awards for his hitting ability, baserunning, and leadership. He began playing for the New York Yankees in 1995, having been drafted three years earlier. He won Rookie of the Year in 1996 and became the team's starting shortstop the same year. Jeter has set several team and league records and is one of the most heavily marketed athletes of his generation.
  • "Episode 2" (Twin Peaks) (nom) by Grapple X. "Episode 2", the third episode of the first season of the American television series Twin Peaks, was directed by David Lynch. It follows the investigation into the murder of schoolgirl Laura Palmer and introduces a supernatural element to the series. First broadcast on April 19, 1990, the episode is considered ground-breaking by critics.
  • HMS Agincourt (1913) (nom) by The ed17 and Sturmvogel 66. HMS Agincourt, a dreadnought battleship built in the United Kingdom for Brazil, was controversial during its construction as it was first sold to the Ottoman Empire and then seized by the British; this seizure was a factor in the Ottomans siding with Germany in World War I. Agincourt spent much of the war on patrols and exercises. After a period in reserve, she was sold for scrap in 1922.
  • David Evans (RAAF officer) (nom) by Ian Rose. Evans (b. 1925) is an Australian airman. He joined the RAAF during World War II but first saw combat during the Vietnam War. He continued to rise through the ranks before being selected as Chief of the Air Staff in 1982, a post he held until his retirement three years later. Since then he has unsuccessfully run for public office and continues to serve as a defence advisor.
  • Cosima Wagner (nom) by Brianboulton. Wagner (1837–1930) was the second wife of Richard Wagner and his muse. Born to a Hungarian composer, Cosima Wagner's first marriage was to Hans von Bülow. Unhappy with her loveless marriage, she became involved with Richard Wagner and, after his death, continued to run his Bayreuth Festival. As she became identified with anti-Semitism and extreme racialist theories, her work remains controversial.
  • Nickel (United States coin) (nom) by Wehwalt. The nickel, a five-cent piece currently composed of 75% copper and 25% nickel, has been struck in the United States since 1866, when gold and silver became scarce after the Civil War. The coin has seen ten designs, including four released in a period of two years. Nickels currently cost eleven cents to produce; the US Mint is looking for a way to lower production costs.

Five featured lists were promoted this week:

  • ICC Women's Cricketer of the Year (nom) by Harrias. The International Cricket Council (ICC) Women's Cricketer of the Year is an annual award first given in 2006. Based upon the players' performances in the voting period, the award has seen twelve nominees since its inauguration. No player has won more than once.
  • Boden Professor of Sanskrit (nom) by Bencherlite. The position of Boden Professor of Sanskrit at the University of Oxford in Britain was established in 1832 as a way to expedite the conversion of Indians to Christianity. It has since been held by eight persons. Initially elected, the position is now filled by the university.
  • List of National Natural Landmarks in Michigan (nom) by Dana boomer. The US state of Michigan is home to 12 of the almost 600 United States National Natural Landmarks, which includes areas of geological and biological importance. The program is managed by the National Park Service, and the first landmark was designated in 1967.
  • List of Queens Park Rangers F.C. players (nom) by Miyagawa. Since its establishment in 1888, Queens Park Rangers Football Club has seen 1,100 total players, 163 of them appearing in at least 100 games. Tony Ingham made 555 appearances for the club, the most in its history, while George Goddard was its top scorer.
  • List of Israel international footballers (nom) by HonorTheKing and Cliftonian. The Israel national football team, which first played in 1934, has seen more than 450 players, of which 97 have made more than twenty appearances. Arik Benado made 94 appearances for the team, the most in its history, while Mordechai Spiegler was its top scorer.

Eight featured pictures were promoted this week:

The Tomb of I'timād-ud-Daulah in Agra, India, a new featured picture


Reader comments

2012-07-30

Talking performance with CT Woo and Green Semantic MediaWiki with Nischay Nahata

Talking performance with CT Woo

CT Woo relaxing on the second day of the 2011 Berlin Hackathon

In the light of recent questions over the long-term reliability of Wikimedia wikis, the Signpost caught up with CT Woo, the Wikimedia Foundation's director of technical operations.

Hey CT. Many users have reported timeouts and other performance problems over the last few months. Does the Foundation view these as separate incidents or as representative of a larger trend?
There are several reasons. For example, we are in the midst of changing file systems from NFS to an object storage system (OpenStack Swift). Since it is a very new product, we did discover a performance issue occuring during some image deletions. We have investigated, tracked it down and I am happy to report it is no longer an issue. Also recently, we hit a Linux kernel bug where systems started rebooting themselves after about 211 days of uptime. As a result, we had to patch all the affected servers. In addition, a number of development teams (especially Platform and Localisation) have changed their build-test-deploy process over the last few months and are now rolling out more frequent (albeit smaller) deployments. I do like to add that 2011/2012 has been a relatively good year for our site uptime metrics, better than 2010/2011. For readers of Wikipedia, the uptime was 99.97%. For editors, the uptime was 99.86%.
Does the Foundation feel that it has the resources at its disposal to make these kind of problems a thing of the past?
Resources are always a constraint. Whenever we encounter or discover a critical issue, we will all circle in to fix the problem. We usually gather the domain experts when we hit a hard problem and they could be from the Foundation or from the community. For example, the Varnish Software folks are helping us now to fix some issues when using Varnish for multimedia streaming purposes.
Is there not a tension between the operations team on the one hand and development teams on the other that could cause more issues in the future?
On the contrary, the teams work together very well. Yes, we do have differences in opinions occasionally but they are all healthy discussions. Most of the time, the operations team aren't the ones who perform the deployment but they are on standby. However, should we find performance issues with the deployment, and depending on the severity, we do revert the changes, using perform profiling to help identify bottlenecks.
CT, thank you.

Google Summer of Code: Green Semantic MediaWiki

The logo of Semantic MediaWiki, a collection of extensions for MediaWiki and the target of Nischay Nahata's Google Summer of Code work

In the second of our series looking at this year's eight ongoing Google Summer of Code projects, the Signpost caught up with developer Nischay Nahata. Nischay is working on performance improvements to Semantic MediaWiki (SMW), a collection of extensions not in use on any Wikimedia Projects, but nevertheless boasting a significant list of adopters. SMW is also regarded as an influential player when it comes to deciding the course of MediaWiki's potential adoption of so-called "structured data" forms, which have recently come to prominence with the establishment of the Wikidata project. While SMW and Wikidata are distinct projects, there is an active exchange of ideas (and developers) between them. Nischay explained to the Signpost what he has been trying to accomplish, and what its broader impact might be:

Nischay regularly updates a blog following his latest progress.

In brief

Signpost poll
Reader poll

Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for several weeks.

  • Gerrit discussions continue: As reported in previous editions of the "Technology report", the start of code review tool Gerrit's own review period has sparked a series of discussions about its utility. In one, the possibility of making Gerrit compatible with popular Git repository management site GitHub was addressed (wikitech-l mailing list); in another, the possibility of changing the Gerrit visuals to something without "puke green/yellow colour schemes" (also wikitech-l). One thought that has come to prominence focusses on the soon-to-be-released Gerrit 2.5, which allows reusers such as Wikimedia to add their own plugins, an advance which will no doubt please those supporting an "improved Gerrit" outcome to the review. Improvements to Git statistical tracking were also in the news this week.
  • Lua to hit first WMF wikis in August: According to a recent update by Director of Platform Engineering, Lua scripts could be in operation on a Wikimedia wikis as soon as next month. Deployment will start on a test wiki, before moving to MediaWiki.org, when the possibilities afforded by MediaWiki's first serious attempt at providing a template programming language will begin to come under serious scrutiny. Talks regarding Lua (see previous Signpost coverage for context) were well received at both Berlin and Washington D.C.; any deployments are likely to attract significant developer attention.
  • Meet the Analytics teams: In a post on the Wikimedia blog, the WMF Analytics team introduced their work, focussed on an update to the Wikimedia Report Card and Kraken, a new "data services platform" aimed at providing a huge array of statistics generated from dozens of datasets. The blog post also stressed the Foundation's commitment to privacy under the heading "counting not tracking". WMF wikis have traditionally been praised for high privacy ratings, albeit at the potential expense of data collection (for example, see previous Signpost coverage).
  • Geolocation, geolocation, geolocation: The possibility of upgrading MediaWiki's geolocation abilities was raised this week (wikitech-l mailing list). Geolocation powers geonotices, messages delivered via the watchlist to editors from a specific area, usually advertising meetups and other real world events. The privacy implications of utilising data other than just publicly available IP addresses will no doubt also need to be considered.
  • One bot approved: 1 BRFA was recently approved for use on the English Wikipedia:
    • Legobot's 13th BRfA, creating a list of incorrectly moved pages for WP:AFC;
At the time of writing, 16 BRFAs are active. As usual, community input is encouraged.

Reader comments

2012-07-30

No pending or open arbitration cases

For the second time this year and the fourth in the history of the Arbitration Committee, there are no requests for arbitration or open cases.

No pending cases

The closure of last week marked the closure of the last open case before the Arbitration Committee. This has only happened on three occasions: in 2009, 2010 and in May of this year. At the time of writing, the Committee has no requests for arbitration before it.

Arbitration cases do not form all of the Committee's workload, however, as there are four requests for clarification and amendment and one motion being discussed.

Other requests and motions

Arbitrator Kirill Lokshin proposed a motion requiring the alteration of any instances of an editor's previous username in arbitration decisions to reflect their name change(s). Any instances appearing within the:

  • enforcement log may be updated by any uninvolved administrator on request;
  • text of a finding or remedy may be updated by the clerks on request; and
  • evidence submissions of a case or other preliminary documents may be updated by the clerks with the committee's prior approval.

The Devil's Advocate has initiated an amendment request for the controversial Race and intelligence case. The request calls for the amendment of review remedies 1.1, 6.1 and 7.1.

Amendment 1 concerns 6.1 and 7.1; calls for the modification of SightWatcher's and TrevelyanL85A2's indefinite omni-namespace edit and discussion ban from Race and intelligence topics, including participation in discussions concerning topic-editor conduct, to be a standard topic ban from Race and intelligence-related edits (broadly construed) with a clearly-defined route for appeal of the sanction.

Amendment 2 concerns 1.1; calls for the modification of Mathsci's admonishment for engaging in battlefield conduct to include an explicit warning that further battleground conduct (towards editors) related to the topic will be "cause for discretionary sanctions."

Reader comments

If articles have been updated, you may need to refresh the single-page edition.