According to one possibly over-simplistic measure, the core Wikimedia community, and in particular the core community on the English Wikipedia, has recently stopped declining and might even have started to grow again.
Month | 2014 | 2015 | change |
---|---|---|---|
January | 3,232 | 3,312 | 80 |
February | 2,957 | 3,051 | 94 |
March | 3,131 | 3,309 | 178 |
April | 2,979 | 3,156 | 177 |
May | 3,051 | 3,223 | 172 |
June | 2,981 | 3,245 | 264 |
July | 3,024 | 3,399 | 375 |
Month | 2014 | 2015 | change |
---|---|---|---|
January | 10,331 | 10,625 | 294 |
February | 9,508 | 9,779 | 287 |
March | 9,936 | 10,446 | 510 |
April | 9,533 | 9,986 | 453 |
May | 9,689 | 10,075 | 386 |
June | 9,276 | 9,891 | 615 |
July | 9,420 | 10,280 | 860 |
For some years now the English Wikipedia and the Wikimedia movement generally have been losing active editors faster than they have been recruiting them.
But one interesting indicator has now started to climb and indicates that the core community may actually be growing again. Though a range of other indicators from the appointment of new admins on the English wikipedia, the number of new accounts created, and the number of editors doing more than five edits per month are still flat or in decline.
The number of editors saving more than 100 edits each month is a long-standing metric published about Wikipedia and other WMF projects. For seven consecutive months, from January to July 2015, that indicator has been positive on both the English Wikipedia community and the whole Wikimedia project—though the situation is more complex on some other sites, such as the German Wikipedia.
We know there are seasonal events that affect the community, and months themselves vary in length, so February 2015 was shorter than January or March; but more editors were contributing more than 100 edits that month than in February 2014; similarly, in January 2015 there were more active editors than in January 2014, a trend that has now run for seven months. Last month, 12,349 editors made more than 100 edits across all projects, 10,280 editors across all versions of Wikipedia, and 3,399 editors on the English Wikipedia, as opposed to 11,257, 9,420, and 3,024 editors respectively in July 2014.
The matter has been discussed on the research mailing list, Wiki-research-l, during the past two weeks.
As with any data over time, there is always the risk that this could just be anomalous, but Wikimedia Foundation data analyst Erik Zachte has now said of the phenomenon: "The growth seems real to me." Zachte has also pointed to the late 2014 speed-up of editing on the Wikimedia sites as a potential contributor to the increase. Implementing HHVM speeded up the saving of edits, which should logically have more impact on wiki gnomes doing lots of small edits than on editors who make just a few saves per hour.
Another theory suggested on the research list and elsewhere has attributed the increase to the improvements to Visual Editor, though with barely ten percent of the most active editors on English Wikipedia using it, it is unlikely to be a major or sole reason for the apparent increase.
The different leadership style of new Foundation executive director Lila Tretikov may be bearing fruit, in terms of better relations between the Foundation and the most active editors.
There is also some concern that Editors saving over 100 edits per month is a simplistic metric; for example, it will include users of highly automated tools such as AutoWikiBrowser, STiki, or Huggle who may achieve that edit count in less than an hour per month, but omits an editor who spends an evening every week writing or rewriting one or two articles, but who might only save an edit every half an hour in that evening.
Should the trend continue, and assuming that someone doesn't find a software bug that has caused the anomaly, future lines of analysis could include examining how much of the increase is due to fewer editors leaving, more inactive editors returning, more new editors joining, and a greater number of casual editors increasing their editing frequency to more than 100 edits per month.
August figures are expected in about a month. It will be very interesting to see whether the trend continues.
The Russian Wikipedia has been the target of official government ire and censorship previously. The latest incident originated in the small village of Chyorny Yar, where on June 26 a prosecutor obtained a court order demanding the deletion of that Wikipedia's article on charas, which the English Wikipedia defines as "a hashish form of cannabis ... made from the resin of the cannabis plant." In Russia, telecommunications and official censorship are overseen by Roskomnadzor, whose duties include censoring pages regarding the use and production of illegal drugs. Roskomnadzor determined that the page on charas should be removed by August 21 or the Russian Wikipedia would be blocked in that country. According to Sputnik, a government-owned news service, a Roskomnadzor official told the newspaper Izvestia:
“ | Roskomnadzor understands the importance of Wikipedia for society. But it goes like this: today it 'academically' writes about drugs, tomorrow 'academically' about forms of suicide, and the day after tomorrow publishes any kind of banned content, but with 'academic' sources. | ” |
Both sides complained about a lack of communication. Executive Director of Wikimedia Russia, Stanislav Kozlovskiy, told the Washington Post that in the past there was "dialogue" with government regulators concerning problems, but not in this case: "We tried to call them but were told that the press officer is on vacation and no one else is authorized to talk to us. They preferred to communicate via statements on the Internet instead.” According to Sputnik, Roskomnadzor head Vadim Ampelonskiy told Izvestia they had also attempted contact. "We were unpleasantly surprised when ... Kozlovskiy, instead of implementing the law, began a large-scale media campaign." The media campaign resulted in a large spike in traffic to the charas article (see figure right).
The entire encyclopedia would have to be blocked because of the recent implementation by the Wikimedia Foundation of the HTTPS protocol on all Wikimedia projects (see previous Signpost coverage). Kozlovskiy told the Post that Russian internet providers do not have the "expensive equipment" needed to block individual pages on sites using HTTPS. Parker Higgins of the Electronic Frontier Foundation told The Verge that "One of the arguments that advocates have made in favor of HTTPS is that it changes the calculus around censoring individual pages." He said that HTTPS requires that governments engaging in censorship make an "all or nothing" decision about whether to block an entire site, or to not engage in censorship at all.
The Verge quoted two Russian journalists about the possible reasons behind the block. Nikolay Kononov, editor-in-chief of SecretMag.ru, said "I think they're trying to show they can ban whatever they want, whenever they want. It's a show of intimidation, like two boxers circling each other in a ring." Investigative journalist Andrei Soldatov suggested that it may be part of an attempt to force the encyclopedia to abandon HTTPS, which he noted is impenetrable by SORM, the Russian internet surveillance system. If so, Wikimedians are unbowed. Kozlovskiy told the Post that “We are not going to stop using the https protocol to make it easier for Roskomnadzor to censor Wikipedia.”
Global Voices Online reported that Russian Wikipedians debated on how to respond, with suggestions ranging from "complete compliance ... to complete defiance". The article on charas was not deleted, but it was moved to charas (drug substance) and the original article title became a disambiguation page which included links to a number of other articles, including an Asian river and a grape. Because the court order specified a specific URL, Global Voices Online speculated that Russian Wikipedia editors might have "outsmart[ed]" Roskomnadzor.
The Russian Wikipedia was blocked on August 24, but the block was only in place very briefly, and some internet providers had not instituted the block yet. Sputnik reported that Ampelonskiy told Izvestia that "Wikipedia was saved by FSKN," the Federal Drug Control Service of Russia. He said that FSKN certfied that the article was no longer in violation of the law. He said
“ | We highly value the efforts the Wikipedia community made on Saturday and Sunday to change the text. The first version of the 'Charas' article did not even have one corroborative source, so it was not even in accordance with the rules of Wikipedia itself ... The text was completely reworked by the editors, and really became academic and based on science. | ” |
While this threat may be over, at least temporarily, Sputnik ominously reports that "Roskomnadzor is waiting for Wikipedia to change the content of three articles; on 'self-immolation,' 'suicide,' and 'ways of committing suicide,' which were declared against the law by Rospotrebnadzor, the federal watchdog for consumer protection." G
Wikimania is the annual international conference for Wikimedia contributors. About a thousand people convene for the three-day main conference, in which five conference tracks are ongoing for eight hours. Conference tracks cover such topics as presenting individuals’ projects, reviewing community organizing plans, promoting access to information sources, developing tutorial infrastructure, legal issues, software demonstrations, regional outreach, metrics reporting, and reviewing research. Before the main conference there is a two-day preconference, termed a hackathon, in which people meet in small groups for meetings, workshops, training, and more personal discussion. I went to the conference in DC in 2012, Hong Kong in 2013, London in 2014, and Mexico City in 2015.
The Mexico City conference was supposed to be held at a the Vasconcelos Library but instead was held at a Hilton Hotel. Wikipedians love libraries and in the election process which chose Mexico as the host city, a major factor persuading the community to choose Mexico was the organizing team’s enthusiasm for the library. Two months before the conference happened the venue was changed. I'd not noticed the announcement of that change, and was surprised to learn of it quite close to the event. Reasons cited for the change were the inability to secure hotel accommodation close enough for attendees, and uncertainty about the library's Wi-Fi capacity.
These things may be so, and perhaps the library was always an inappropriate choice of venue; but I regret that so many volunteers did so much work for about a year planning an event at this library only to suddenly change. How much volunteer work was expended in the original plan? Why was that venue not sooner identified as inappropriate? Considering that volunteers are supposed to organize things like venue location—was there some way that volunteer labor was insufficient to accomplish the task, and could the paid staff which did the emergency moving of the event have been diligent in the original assessment and saved volunteer time?
The mythology around the Wikimedia movement is that volunteers do everything. In reality, paid staff do a lot and serve in the most essential roles. The mythology partly developed because from 2001 to 2008, the Foundation and the community had almost no money, and no external organizations were funding Wikimedia contributors. Since about 2008 the situation has changed a lot, but there are few evaluations of the changes, and still fewer publications about the changes. From the WMF's perspective, their funding has gone from nothing in 2001 to more than US$65 million this year. I mention this in my “Value of a Wikipedian” post.
Another change is that more organizations are willing to hire their own Wikipedians. I was the first person hired to do Wikipedia work full-time indefinitely. It was a crazy concept at the time, and many would still say that it's a strange idea; but nowadays a lot of organizations are doing it. Since moving to New York I've come to realize that a lot of editing in television- and movie-related articles is done by paid editors, and this is especially taboo. Still, on Wikipedia there is a lot of demand for good information on popular television shows, and people seem to appreciate Wikipedia’s coverage of this. For many shows there are enough fans to appreciate reading the content on Wikipedia if paid staff put it there. In a lot of ways, paid contributions are creeping into Wikipedia without there being any history of community discussion to address the implications.
I say this to give some context to what in any other nonprofit movement wouldn't be an issue. Wikimania is imagined to be a community-run event, but leaving a conference entirely to volunteers is too burdensome for the volunteers and too risky for the community movement. There is a community memory that in 2010 in Poland, the volunteers managing the Wikimania conference became overwhelmed. As the story goes, the Wikimedia Foundation stepped in and had staff take over some essential roles during the conference and hired local event coordinators to make it go well. In 2011 the conference in Israel went well because the Israeli chapter is known for good business sense, having an office with good fundraising and management practices, and otherwise being a volunteer organization with effective staff support. In 2012 the Wikimania coordinators in DC paid US$30,000 to hire an event consultant, and the WMF granted that because event consultant is a role that was available for hire in the US, and because they actually managed finance, legal contracts, and event coordination while giving volunteers final sign-off on everything without having a cozy relationship with them.
In 2013 the volunteers in Hong Kong came in for a lot criticism for not reporting the finances of the conference—see for example the Signpost report “Hong Kong’s Wikimania 2013—failure to produce financial statement raises questions of probity“. I know that Hong Kong didn't hire an event planner in the way that one was hired for DC, and in my opinion if they had, and if their event planner had managed their accounting, then there would have been no community objection to their reporting of the event. Based on my incomplete information, had the Hong Kong team not depended on volunteers to do accounting—which can be tedious and time-consuming for a volunteer to undertake for such a large event—and instead asked for funding for a consultant to produce the report and accounting, they would have secured the money and high praise for their management of the event.
In other respects, I think it was the best-managed Wikimania I've attended. They managed to have volunteers everywhere greeting everyone at so many parts of the process, and the volunteers collectively seemed to me like a trained army that was on the edge of all activity continually directing me into the experience they had designed and kept on a tight schedule. The London conference was great, but then also, the London Wikimedia chapter is the second-best funded after Germany and has about 10 staff. They also managed the conference in an expensive conference venue that required its own staff be funded to coordinate the event, in contrast to for example the DC and Hong Kong events in universities, which depended heavily on volunteers to complement the few staff services and the complete Hilton services in Mexico.
In 2014 I helped organize WikiConference USA in New York with other volunteers. Organizing conference programming was a fun activity for volunteers—doing event management was tedious. For us volunteers, we liked advertising the event in some channels, reviewing program submissions, soliciting for scholarship applications and reviewing them, and recruiting volunteers to be on hand for the day of the event. Some of the duties we didn't enjoy, and which we would have preferred to turn over to paid staff, included negotiating the event with the venue and caterers; managing the written agreements about finance and safety; coordinating a travel team to dispense money for scholarship recipients; the accounting; the metrics part of the grant-reporting to the Foundation; comprehensive communication in the manner of communications professionals as opposed to the style of grassroots volunteers; and responding to harassment (a stalker during that event managed to spoil the mood of the attendees). We managed the conference for about US$30,000 because the venue was a school, which donated what elsewhere would have cost some $60,000. About $10,000 of the $30,000 was the food and incidentals, and the other $20,000 was for travel scholarships. There were about 10 of us on the organizing team and I suppose we met in person about 30 hours each to plan the event plus maybe as much time alone doing things online. This was for a three-day conference for about 300–500 people. Wikimania is no doubt on the same or larger scale.
Is it worth having volunteers spend their time in this way? The money is less of an object these days. Volunteer time is scarce, and anyone who would consider volunteering to convene a Wikimedia conference is likely to also be a person whose time could be spent where expertise is scarce, like actually presenting Wikimedia culture instead of only creating a space for others to do this. Professional event coordinators are at least two to three times more efficient in organizing events than a volunteer team would be, and will anticipate bureaucratic reporting standards intuitively when volunteers might not anticipate the need at all.
Until now, Wikimania conferences have been held based on an Olympic-style bidding process in which groups of volunteers in different cities around the world bid for the right to host the conference. The outcome of the bid is that they get something like $300,000 to host the conference, with more money coming for special needs on request and constituting maybe $100,000 more. The restriction is that volunteers are discouraged from hiring paid staff to present the conference, and the event is expected to be as volunteer-run as possible. I wonder if the Foundation might consider the history of difficulties, and rethink the idea that volunteers should present conferences.
I think it would be more reasonable for the WMF to hire event staff to manage almost all parts of the event, if only to free the volunteers’ time for more personal engagement. A local Wikipedia team should coordinate some hospitality functions, like staffing the registration desk, having volunteers around to answer questions about the neighborhood, in selecting the keynote speakers and scheduling programming, and in recruiting Wikipedians to participate. Historically an online volunteer committee has selected the program submissions to be featured and has selected scholarship recipients. I want those roles to continue, but as for event coordination—paid staff ought to be used.
I worry about two side issues.
One is that the Hilton Hotel is an expensive American hotel with uncompromising business ethics. They charge about $300 a night for rooms, so for the ~100 scholarship recipients and some 100 WMF staff who attended the conference, this was about the rate paid for five nights. $300 × 200 people × five nights is $300,000, which is the typical conference scale and probably about the price including venue space, catering, and the negotiation of rate. It bothers me that this money went to an American company and not to a local business. It also bothers me that this rate is so far removed from the local economy. A recent economic report says 46% of people in Mexico made less than $157 per month, so one night in this hotel costs the equivalent of about two months' wages. In Mexico City, 76% of people make $157 or less. How did the local Wikipedia contributors feel about hosting a conference in a venue so far removed from local culture and norms? How would the international guests have felt to stay in a local hotel instead of an American one?
The other issue is that almost all of the conference presentations were showcasing the work of paid staff, when many people think of the Wikimedia movement as a volunteer initiative. There were five days of conference. The first two were hackathon days, in which WMF staff controlled everything in the schedule. This was the first year that that had happened. There were lots of empty rooms reserved, and people could meet during the first two days, and scholarship recipients were present, but posting to the schedule was prohibited. In the other three days of the conference, I counted 150 talks. Among these, 48 were presentations by WMF staff. The Foundation didn't participate in Spanish-language talks, of which there were 26. So 39% of the English-language talks were paid presentations by WMF staff. Another 50 of the English-language talks were by people who were paid to present by some organization other than the WMF (including chapter staff or paid Wikipedians like me), so that really just leaves (150 − 48 − 26 − 50 = ) 26 English-language talks, or about 16%, that were presented by volunteers in the three days available to the community.
I'm grateful to the volunteers who contributed to put this conference on; but I'd have preferred that the Wikimedia volunteer community fill most of the speaking slots—perhaps 66% of them. I want to emphasize volunteers, because the community and the Foundation put so much emphasis on volunteer contributions. I think there's a perception that the community speaks for itself, but somehow this year the community was mostly just the audience. At the very least, I'd like to see future Wikimanias advertise which talks are presented by volunteers, WMF staff, or others.
Lane Rasberry is Wikipedian-in-residence at Consumer Reports. This article originally appeared on the author's blog and is republished here with his permission.
Most fundraising in the Wikimedia movement is handled directly by the Wikimedia Foundation (Wikimedia Germany also raises significant funding, much of which is forwarded to the Foundation). Though declining readership numbers have brought concerns about the future, the Foundation's fundraising has continued its success: this financial year's $58.5 million target was reached just halfway through the year. Part of these funds, mostly garnered through annual fundraisers, pay for the operations of the servers and of the Foundation itself; and part of it returns to the movement through one of the Wikimedia Foundation's four grantmaking operations.
The grantmaking system in place today came about as a result of a broader discussion about movement roles that took place in 2012–13. There are four kinds of Wikimedia grants: Travel and Participation Grants, which fund individuals representing the movement at primarily non-Wikipedian events; Individual Engagement Grants, which fund individual or small-group research projects related to the Wikimedia movement; Project and Event Grants, for projects and events conducted by individuals and groups; and Annual Plan Grants, which provide annual-cycle funding to "eligible" affiliates such as the largest chapters.
Community grant-making is a complex and inherently political process. The Wikimedia community is a large and divisive place—one in which organic and systematic growth vie with each other. A variety of funding schemes have been tried, to target a variety of needs emerging at a variety of times and garnering a variety of results. Each process has its own adherents, its own community, and its own review body, resulting in a large number of complicated but important details difficult to penetrate for all but the most experienced onlookers.
So it is significant news that this week the Foundation's fundraising team put forward an IdeaLab proposal aiming for a complete refresh of the system as it exists today (the IdeaLab is the WMF's central fundraising incubator for providing community review ahead of grant submissions). The proposal lists three weaknesses in the current system:
“ | People with ideas don’t know how to get the support they need. It is difficult for people with ideas to know where to get money and support for their ideas. Once they get started, a clear path with support for growing successful programs or technology is often missing.
Processes are too complicated and rigid. Each program has different processes for getting money and support, and there are both gaps and overlap between these programs. We need to make a lot of exceptions to ensure everyone gets what they need. Most requests that need an exception get pushed to Project and Event Grants where systems aren't designed to handle them. Committees are overwhelmed with current capacity. Committees reviewing the widest range of grants aren't able to give all requests a quality review. The most robust committee processes are time-intensive and won't be able to scale as the number of requests grow. |
” |
The proposal prescribes replacing the current fourfold system with a three multi-tiered platforms. First, there would be project grants for both individuals and smaller organizations; these would consist of seed funds for experimental purposes and growth funds to sustain growing projects. Second, there would be event grants, which would fall into three subcategories: travel support for event attendance, micro funds for small community events, and logistical support (the case study is ordering pizza and stickers for a local meetup), and large event support for large conferences—up to and including, it seems, the annual international Wikimania itself. Third, annual plan grants for affiliates would continue, but would now deal with two categories: a rigorous system for larger bids; and a simpler process for smaller bids (provisionally capped at US$100,000 and one FTE staff member employed under the grant).
How can the community participate in the dialogue? A significant reworking of fundraising is an immensely complicated process to engage in—so much so that the IdeaLab proposal comes with not only its own calender but an entire page on how to direct feedback. An FAQ has been provided, which attempts to answer common questions. The consultation is scheduled to last until 7 September, with the requisite changes discussed expected to start to come into effect from 31 October, when the APG process split would be piloted, through 2016. For further discussion see the talk page. For more information on how grants are managed and disbursed, start here.
For more Signpost coverage on grantmaking see our grantmaking series.
Four featured articles were promoted this week.
Four featured lists were promoted this week.
Twenty-four featured picturess were promoted this week.
Back in December last year, one of the remedies in the Interactions at GGTF case was to have Eric Corbett topic-banned from the Gender gap task force (GGTF). This has resulted in his being blocked multiple times for violating the topic ban. A discussion following one of the blocks placed on him, however, has resulted in a decision to make amendments and clarifications to the text of both the Discretionary sanctions and the Arbitration enforcement pages.
What first started the arbitration enforcement case was a comment made by Eric Corbett on his own talk page. This comment was discussed on WP:AE and was closed by Black Kite with no action taken. GorillaWarfare, however, blocked Eric Corbett for a month. This action was later taken up at the Arbitrator's noticeboard, with the discussion being closed by Reaper Eternal to have Eric Corbett unblocked and the consensus being seen as GorillaWarfare's block being seen as "a bit out of process". The case was later opened the next day, June 29.
After nearly two months of gathering evidence and much deliberation, on August 24 the case was closed. With the closure, two facts were agreed upon. The first being that Eric Corbett's comment was the cause of the dispute. The second, and more importantly, it was found that GorillaWarfare's actions "fell foul of the rules set out in Wikipedia:Arbitration Committee/Discretionary sanctions#Appeals and modifications and in Wikipedia:Administrators#Reversing another administrator's action, namely the expectation that administrative actions should not be reversed without [...] a brief discussion with the administrator whose action is challenged." As well, the case found that Reaper Eternal violated Wikipedia:Arbitration Committee/Discretionary sanctions#Appeals and modifications, which requires, "for an appeal to be successful, a request on the part of the sanctioned editor and the clear and substantial consensus of [...] uninvolved editors at AN."
Because of these findings, the remedy for solving the issues the case brought up was to delegate the drafters of the case to amend and clarify both WP:ACDS and WP:AE. What will that mean for the future of ArbCom? While nothing is certain for now, it is at least expected that the discretionary sanctions page will look completely different from its current state soon. Though it is possible more cases like this one will be brought up again. We'll just have to wait and see to figure out the impact this case will have.
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
OpenSym, the eleventh edition of the annual conference formerly known as WikiSym, took place on August 19 and 20 at the Golden Gate Club in the Presidio of San Francisco, USA, followed by an one-day doctoral symposium. While the name change (enacted last year) reflects the event's broadened scope towards open collaboration in general, a substantial part of the proceedings (23 papers and posters) still consisted of research featuring Wikipedia (8) and other wikis (three, two of them other Wikimedia projects: Wikidata and Wikibooks), listed in more detail below. However, it was not represented in the four keynotes, even if some of their topics did offer inspiration to those interested in Wikimedia research. For example, in the Q&A after the keynote by Peter Norvig (Director of Research at Google) about machine learning, Norvig was asked where such AI techniques could help human Wikipedia editors with high "force multiplication". He offered various ideas for applications of a "natural language processing pipeline" to Wikipedia content, such as automatically suggesting "see also" topics, potentially duplicate article topics, or "derivative" article updates (e.g. when an actor's article is updated with an award won, the list of winners of that award should be updated too). The open space part of the schedule saw very limited usage, although it did facilitate a discussion that might lead to a revival of a Wikitrust-like service in the not too distant future (similar to the existing Wikiwho project).
As in previous years, the Wikimedia Foundation was the largest sponsor of the conference, with the event organizers' open grant application supported by testimonials by several Wikimedians and academic researchers about the usefulness of the conference over the past decade. This time, the acceptance rate was 43%. The next edition of the conference will take place in Berlin in August 2016.
An overview of the Wikipedia/Wikimedia-related papers and posters follows, including one longer review.
Review by Morten Warncke-Wang
"Tool-Mediated Coordination of Virtual Teams in Complex Systems"[10] is the title of a paper at OpenSym 2015. The paper is a theory-driven examination of edits done by tools and tool-assisted contributors to WikiProjects in the English Wikipedia. In addition to studying the extent of these types of edits, the paper also discusses how they fit into larger ecosystems through the lens of commons-based peer production[supp 1] and coordination theory.[supp 2]
Identifying automated and tool-assisted edits in Wikipedia is not trivial, and the paper carefully describes the mixed-method approach required to successfully discover these types of edits. For instance, some automated edits are easy to detect because they're done by accounts that are members of the "bot" group, while tool-assisted edits might require manual inspection and labeling. The methodology used in the paper should be useful for future research that aims to look at similar topics.
Review by Morten Warncke-Wang and Tilman Bayer
A paper from the WETICE 2015 conference titled "Analysing Wiki Quality using Probabilistic Model Checking"[11] studies the quality of enterprise wikis running on the MediaWiki platform through a modified PageRank algorithm and probabilistic model checking. First, the paper defines a set of five properties describing quality through links between pages. A couple of examples are "temples", articles which are disconnected from other articles (akin to orphan pages in Wikipedia), and "God" pages, articles which can be immediately reached from other pages. A stratified sample of eight wikis was selected from the WikiTeam dump, and measures extracted using the PRISM model checker. Across these eight wikis, quality varied greatly, for instance some wikis have a low proportion of unreachable pages, which is interpreted as a sign of quality.
The methodology used to measure wiki quality is interesting as it is an automated method that describes the link structure of a wiki, which can be turned into a support tool. However, the paper could have been greatly improved by discussing information quality concepts and connecting it more thoroughly to the literature, research on content quality in Wikipedia in particular. Using authority to measure information quality is not novel, in the Wikipedia-related literature we find it in Stvilia's 2005 work on predicting Wikipedia article quality[supp 3], where authority is reflected in the "proportion of admin edits" feature, and in a 2009 paper by Dalip et al.[supp 4] PageRank is part of their set of network features, a set that is found to have little impact on predicting quality. While these two examples aim to predict content quality, whereas the reviewed paper more directly measures the quality of the link structure, it is a missed opportunity for a discussion on what encompasses information quality. This discussion of information quality and how high quality can be achieved in wiki systems is further hindered by the paper not properly defining "enterprise wiki", leaving the reader wondering if there is at all much of an information quality difference between these and Wikimedia wikis.
The paper builds on an earlier one that the authors presented at last year's instance of the WETICE conference, where they outlined "A Novel Methodology Based on Formal Methods for Analysis and Verification of Wikis"[12] based on Calculus of communicating systems (CCS). In that paper, they also applied their method to Wikipedia, examining the three categories "Fungi found in fairy rings", "Computer science conferences" and "Naval battles involving Great Britain" as an experiment. Even though these only form small subsets of Wikipedia, computing time reached up to 30 minutes.
A paper accepted for publication at the 2015 Conference on Information and Knowledge Management (CIKM 2015) by scientists from the L3S Research Center in Hannover, Germany that suggests news articles for Wikipedia articles to incorporate.[13] The paper builds on prior work that examines approaches for automatically generating new Wikipedia articles from other knowledge bases, accelerating contributions to existing articles, and determining the salience of new entities for a given text corpus. The paper overlooks some other relevant work about breaking news on Wikipedia,[supp 5] news citation practices,[supp 6] and detecting news events with plausibility checks against social media streams.[supp 7]
Methodologically, this work identifies and recommends news articles based on four features (salience, authority, novelty, and placement) while also recognizing that the relevance for news items to Wikipedia articles changes over time. The paper evaluates their approach using a corpus of 350,000 news articles linked from 73,000 entity pages. The model uses the existing news, article, and section information as ground truth and evaluates its performance by comparing its recommendations against the relations observed in Wikipedia. This research demonstrates that there is still a substantial amount of potential for using historical news archives to recommend revisions to existing Wikipedia content to make them more up-to-date. However, the authors did not release a tool to make these recommendations in practice, so there's nothing for the community to use yet. While Wikipedia covers many high-profile events, it nevertheless has a self-focus bias towards events and entities that are culturally proximate.[supp 8] This paper shows there is substantial promise in making sure all of Wikipedia's articles are updated to reflect the most recent knowledge.
Review by Andrew Gray
This paper, developed from one presented at the 9th International Conference on Web and Social Media, examined the citations used in Wikipedia and concluded that articles from open access journals were 47% more likely to be cited than articles from comparable closed-access journals.[14] In addition, it confirmed that a journal's impact factor correlates with the likelihood of citation. The methodology is interesting and extensive, calculating the most probable 'neighbors' for a journal in terms of subject, and seeing if it was more or less likely to be cited than these topical neighbors. The expansion of the study to look at fifty different Wikipedias, and covering a wide range of source topics, is welcome, and opens up a number of very promising avenues for future research - why, for example, is so little scholarly research on dentistry cited on Wikipedia, compared to that for medicine? Why do some otherwise substantially-developed Wikipedias like Polish, Italian, or French cite relatively few scholarly papers?
Unfortunately, the main conclusion of the paper is quite limited. While the authors do convincingly demonstrate that articles in their set of open access journals are cited more frequently, this does not necessarily generalise to say whether open access articles in general are - which would be a substantially more interesting result. It has previously been shown that as of 2014, around half of all scientific literature published in recent years is open access in some form - that is, a reader can find a copy freely available somewhere on the internet.[supp 9] Of these, only around 15% of papers were published in the "fully" open access journals covered by the study. This means that almost half of the "closed access" citations will have been functionally open access - and as Wikipedia editors generally identify articles to cite at the article level, rather than the journal level, it makes it very difficult to draw any conclusions on the basis of access status. The authors do acknowledge this limitation - "Furthermore, free copies of high impact articles from closed access journals may often be easily found online" - but perhaps had not quite realised the scale of 'alternative' open access methods.
In addition, a plausible alternative explanation is not considered in the study: fully open access journals tend to be younger. Two-thirds of those listed in Scopus have begun publication since 2005, against only around a third of closed-access titles, which are more likely to have a substantial corpus of old papers. It is reasonable to assume that Wikipedia would tend towards discussing and citing more recent research (the extensively-discussed issue of "recentism"). If so, we would expect to see a significant bias in favour of these journals for reasons other than their access status.
“VEWS: A Wikipedia Vandal Early Warning System” is a system developed by researchers at University of Maryland that predicts users on Wikipedia who are likely to be vandals before they are flagged for acts of vandalism.[15] In a paper presented at KDD 2015 this August, we analyze differences in the editing behavior of vandals and benign users. Features that distinguish between vandals and benign users are derived from metadata about consecutive edits by a user and capture time between consecutive edits (very fast vs. fast vs. slow), commonalities amongst categories of consecutively edited pages, hyperlink distance between pages, etc. These features are extended to also use the entire edit history of the user. Since the features only depend on the meta-data from an editor’s edits, VEWS can be applied to any language Wikipedia.
For their experiments, we used a dataset of about 31,000 users (representing a 50-50 split of vandals and benign users), since released on our website. All experiments were done on the English Wikipedia. The paper reports an accuracy of 87.82% with a 10-fold cross validation, as compared to a 50% baseline. Even with the user’s first edit, the accuracy of identifying the vandal is 77.4%. As seen in the figure, predictive accuracy increases with the number of edits used for classification.
Current systems such as ClueBot NG and STiki are very efficient at detecting vandalism edits in English (but not foreign languages), but detecting vandals is not their primary task. Straightforward adaptations of ClueBot NG and STiki to identify vandals yields modest performance. For instance, VEWS detects a vandal on average 2.39 edits before ClueBot NG. Interestingly, incorporating the features from ClueBot NG and STiki into VEWS slightly improves the overall accuracy, as depicted in the figure. Overall, the combination of VEWS and ClueBot NG is a fully automated vandal early warning system for English language Wikipedia, while VEWS by itself provides strong performance for identifying vandals in any language.
Review by Guillaume Paumier
DBpedia Commons: Structured Multimedia Metadata from the Wikimedia Commons is the title of a paper accepted to be presented at the upcoming 14th International Semantic Web Conference (ISWC 2015) to be held in Bethlehem, Pennsylvania on October 11-15, 2015.[16] In the paper, the authors describe their use of DBpedia tools to extract file and content metadata from Wikimedia Commons, and make it available in RDF format.
The authors used a dump of Wikimedia Commons's textual content from January 2015 as the basis of their work. They took into account "Page metadata" (title, contributors) and "Content metadata" (page content including information, license and other templates, as well as categories). They chose not to include content from the Image table ("File metadata", e.g. file dimensions, EXIF metadata, MIME type) to limit their software development efforts.
The authors expanded the existing DBpedia Information Extraction Framework (DIEF) to support special aspects of Wikimedia Commons. Four new extractors were implemented, to identify a file's MIME type, images in a gallery, image annotations, and geolocation. The properties they extracted, using existing infobox extractors and the new ones, were mapped to properties from the DBpedia ontology.
The authors boast a total of 1.4 billion triples inferred as a result of their efforts, nearly 100,000 of which come from infobox mappings. The resulting datasets are now included in the DBpedia collection, and available through a dedicated interface for individual files (example) and SPARQL queries.
It seems like a missed opportunity to have ignored properties from the Image table. This choice caused the authors to re-implement MIME type identification by parsing file extensions themselves. Other information, like the date of creation of the file, or shutter speed for digital photographs, is also missing as a consequence of this choice. The resulting dataset is therefore not as rich as it could have been; since File metadata is stored in structured format in the MediaWiki database, it would arguably have been easier to extract than the free-form Content metadata the authors included.
It is also slightly disappointing that the authors didn't mention the CommonsMetadata API, an existing MediaWiki interface that extracts Content metadata like licenses, authors and descriptions. It would have been valuable to compare the results they extracted with the DBpedia framework with those returned by the API.
Nonetheless, the work described in the paper is interesting in that it focuses on a lesser-known wiki than Wikipedia, and explores the structuring of metadata from a wiki whose content is already heavily soft-structured with templates. The resulting datasets and interfaces may provide valuable insights to inform the planning, modeling and development of native structured data on Commons using Wikibase, the technology that powers Wikidata.
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.