The Wikimedia Foundation has released its latest report card for the movement's hundreds of sites. The WMF has published statistics since 2009, but only recently have they been expanded in scope and depth to provide a rich source of data for investigating the movement and the world it serves. Erik Zachte, who is from the Netherlands, is the driver of the WMF's statistical output—assisted, he told the Signpost, by "a bunch of colleagues". He has been a Wikipedian since 2002 and the Foundation's data analyst since 2008. Erik writes in his understated way that the report card and accompanying traffic statistics comprise "enough tables, bar charts and plots to keep you busy for a while".
The news is good in terms of the Wikipedias' popularity: monthly page views for the 285 sites rose by a healthy 25% from March 2012 to March 2013, including a 74% rise in views from mobile devices. The Wikipedias are viewed nearly 22 billion times a month—more than 8000 hits a second—or an average of 36 hits a year for every single human, all the more extraordinary for the fact that only about one in four of us uses the internet.
This week, the Signpost gives a thumbnail sketch of some of the statistics concerning page views among the Wikipedias, with a focus on the relationship between the world's major languages—particularly the global role of the English Wikipedia. What we found raises far more questions than it answers, and indicates the extent of the opportunities for using the statistics to analyse both internal and external phenomena.
The English Wikipedia (en.WP) receives 47% of the page views (down from 53% in 2009), and remains dominant among WMF sites. The next most popular WPs are the Spanish and Japanese (at just over 7%), the Russian (nearly 6%), the German (5.4%), and the French (4.2%).
Surprisingly, the average rate at which internet users view en.WP pages is higher in many countries than in the six major countries with a native English-speaking majority (the US, the UK, Canada, Australia, Ireland, and New Zealand—all red in Fig. 1). Among those six, en.WP is by far the most popular in Canada, with 16 views per month, and would be higher still if adjusted for the fact that more than one in five Canadians is a native speaker of French. The UK and Ireland came in next, with 13 views per month, followed by the US, Australia, and NZ on 11 per month.
The average views of en.WP among internet users in the global north is also 11 per month (roughly three-quarters of all views); Europe, North America, generate the same average; Oceania (Australia, NZ, and surrounding Pacific nations) generated 10; the global south views en.WP six times a month (a quarter of all views).
The tangled consequences of European colonisation are evident in profound differences in WP usage among the two dozen modern nation states that have significant ties to Arabic (Fig. 2). At 79%, the Arabic WP page-view rate is highest in the small state of Comoros off the Tanzanian coast, against 11% for en.WP and 2.4% for the French WP. This turns out to be on the extreme end of Arab usage, with a steady fall to less than a quarter in some countries, in favour of the colonial languages: overall, the Arabic WP is still the minority choice, against the English WP and, in places that were French colonies, the French WP.
These inconsistencies suggest that WP choice is complex and multifactorial: the Signpost has been told that nothing is certain, but factors could include a combination of (i) the proportion of internet users who read English (or French); (ii) the perceived quality and/or scope of the Arabic WP versus that of the English (or French) WPs; and (iii) political, educational, or social pressure to use or avoid a certain WP. Each of these factors, if they did play a part, would probably be the result of a number of component factors. While countries that share other languages—such as in the Spanish-speaking world—also show internal differences in their rate of en.WP views, they are not nearly as pronounced as in the Arab world.
Aside from the six major English-speaking countries, the WP viewing patterns of almost every country focus almost entirely on two WPs (in a few cases three); English is usually the second most popular, with tiny percentages going to other WPs. Over the past four years, the Arab world has seen particularly sharp movements away from the colonial languages towards the Arabic WP. Egypt, for example, has reversed from a 62/30 English/Arabic split to 40/53; this has been repeated almost exactly in Saudi Arabia, and to a lesser extent in some other Arab nations. Where French is a major choice, it too has tended to recede along with en.WP. To what extent is this related to the Arab Spring, and a sense of increasing pride and independence in Arab culture and language? And to what extent is it a product of any greater scope and depth on the Arabic WP?
Since 2009, this significant move away from en.WP to the WPs of local languages has been repeated around the world, although not usually as dramatically as in Arabic-speaking countries. There are many distinctive and unexplained patterns. A common scenario is a vacillation between the English and local-language WPs, quarter by quarter, with an unexplained shift to and from English in 2010. Taiwan (Fig. 3) shows the swing from Chinese to English in 2010, and another such swing more recently, in a mirror image characteristic of many countries. (Figs. 3–5 have two y-axes, which are scaled differently, and not from zero, to illustrate this mirrored relationship and to save space.)
Brazil shows a similar relation between English and Portuguese, although there has been a slight move towards en.WP over the past six months. Every Portuguese-speaking country had a precipitous drop in the use of the Portuguese WP in 2010, including Angola, Mozambique, Namibia, East Timor, and Portugal itself. The Signpost has yet to ascertain whether this, and indeed the peak in en.WP traffic around the same time, were artefacts of the data-gathering system.
Against the grain, the three German-speaking countries—Germany, Austria, and Switzerland—have all seen a move away from German and towards English. It has been suggested that this may be connected with a resistance by editors to the coverage of popular culture on the German WP. In Switzerland (Fig. 5), where French is also a major language, the popularity of German is more recently eroding in favour of English, and to a lesser extent French. Luxembourg has seen German usage fall significantly in favour of French and English. However, in neighbouring Belgium, both official languages—Dutch and French—have been gaining the edge on English.
Yet more is inexplicable. There has been movement from English to French in Senegal, Cote d'Ivoire, Niger, Guadeloupe, and Haiti; but from French to English in Réunion, Madagascar, and Rwanda, with gyrations between French and English in Zambia, the Democratic Republic of the Congo, the Republic of the Congo, among other African countries. Panama is one of the few Spanish-speaking countries to be moving towards en.WP.
Interestingly, some major expatriate groups do not appear to align strongly with the WP of their native tongue: only 0.6% of American page views went to the Spanish WP, yet more than 12% of the US population speaks Spanish at home. Similarly, only 2% of views from Finland are to the Swedish WP, although nearly 6% of Finns are native Swedish-speakers and the language has equal status with Finnish as an official language. The WP preferences of minority language groups appears to be a complex issue. By comparison, large native Russian-speaker groups in countries such as the Baltic states that were assimilated into the Soviet Union for most of the 20th century appear to be using both the Russian and the local-language WPs in greater proportions at the expense of English.
Further information: Wikipedia Report Card: summaries for 50 most visited languages.
Reader comments
Instead of interviewing a WikiProject, this week's Report is dedicated to answering our readers' questions about WikiProjects. The following Frequently Asked Questions came from feedback at the WikiProject Report's talk page, the WikiProject Council's talk page, and from previous lists of FAQs. Included in today's Report are questions and answers that may prove useful to Wikipedia's newest editors as well as seasoned veterans.
Next week's interview will be earth-shattering. Until then, shake it up in the archive.
Reader comments
The Signpost interviewed prolific featured content creator and former Signpost "featured content" report writer Crisco 1492 about ? and Indonesian Cinema. ? was the "Today's featured article" for 1 April 2013. 1 April is popularly known as April Fools' Day in many countries.
What inspired you to start the article ? and turn it into a featured article?
I first heard about ? from one of my students, who had had to watch it for her religion course at university. It sounded (and, ultimately, was) interesting, so I hunted down as much information as I could and turned it into a good article before it had been out for a year. When the DVD was (finally) released in early 2012, I decided to expand the article as best I could with the new sources; I also went into a more detailed search for reviews and other published reports. Ultimately the FAC passed in August after some helpful reviews.
The title was a bonus; it lent itself to a variety of possible April Fools jokes for 2012's DYK section (nomination), such as "Did you know ... that 150,000 people in ten days saw ??" When Prioryman suggested using the article for April Fools, I agreed wholeheartedly. I still call it the shortest DYK hook ever, and I guess we can add shortest TFA as well.
Is there anything that you find especially interesting about Indonesian films in general? Well, my major is in Indonesian literature, but I like to think of that field as part of Indonesian popular culture as a whole. As such, I've done some writing (both on and off Wikipedia) on films and music as well. I envision them as being in a sort of dialectical relationship, where earlier works inspire later works (in the same or different media), while witnessing these later works may also change how we see earlier works. Marah Roesli's Sitti Nurbaya, for instance, inspired a film, several stage plays, two TV series, and at least one song; seeing the characters and their actions visualised will naturally affect how we read and interpret the book.
Indonesian popular culture, including films, are quite different than the American popular culture I grew up with back in Windsor, so I guess I was first interested in it because it's exotic. Many of them are based on Indonesian folk tales, legends, and novels, or feature Indonesian culture and history which has generally not reached Hollywood. The General Assault of 1 March (note that it's a redlink, as of the time of writing) inspired three films in Indonesia, but has received no attention in foreign cinema. Even the Jakarta-based films, which tend to have greater Western influence, still show an Indonesian character which reflects the socio-political concerns of society. Some themes which we can see include unchecked development, human trafficking, the shadow of communism, and the relationship between Islam and society.
If someone who had never seen an Indonesian film wanted to watch a small number of them to get a feeling for Indonesian cinema, which films would you recommend?
If someone were to look for an introduction to Indonesian cinema, the experience would depend heavily on their personal tastes. If one loves physical comedy, the works by Warkop are a good place to start, but if one prefers low-brow comedy I'd recommend Quickie Express. Some films which should be fairly readily available can be found here. For action films, the easiest to find is certainly The Raid: Redemption, which has already had a US release. There are also biopics like Soegija and Habibie & Ainun, as well as horror films like Mystics in Bali to choose from. My personal favourite so far is Ibunda.
A few which I think are fairly important, which any student of Indonesian film should watch:
This Signpost "Featured content" report covers content promoted between 24 March and 30 March 2013.
The first round of individual engagement grants (IEGs) have been awarded to seven applicants.
The IEG program was introduced in January 2013 to empower individual or small teams of volunteers to tackle long-term on-wiki problems; it covers tasks largely outside the scope of other WMF programs like entity-focused FDC or GAC procedures. The Foundation reaches its final funding decisions based on community input and a volunteer committee's recommendations. The first round of proposals were reviewed in a community consultation period and assessed by the volunteer committee.
This round's grants covered a wide range of topics, including building awareness in China, art schools' contributions to Wikimedia sites, Wikisource–Wikimedia integration, developing a way to browse Wikipedia's structured data, a visual diff system an educational game (the Wikipedia Adventure), and a system that allows editors to browse multiple research archives at once (The Wikipedia Library). The largest amount disbursed was US$15k (€11.6k) for the structured data viewing, followed closely by $13k (€10k) for the Wikisource project; in total, roughly $55.6k (€43k) was awarded.
The Signpost asked Ocaasi, the editor behind the Wikipedia Adventure and Wikipedia Library proposals, about his thoughts on the IEG process and his idea to open paywalled online research archives for Wikipedians.
On the subject of the IEG project itself, Ocaasi called attention to the interface and the design of the page (created by WMF staffer Heather Walls), both of which are highly user-friendly. In his estimation, "the pages don't feel like they're made with markup."
In addition, the individuals involved in the grant selection process itself, including Siko Bouterse—the Foundation's head of IEG grants—and the volunteer committee—which Ocaasi joined but recused from, due to his proposal—were able to help him craft, shape, and refine his proposal. In particular, these individuals were the key to ensuring that his proposal could be run completely independent of the Foundation; this is one of the major differences between IEG and the discontinued Wikimedia Fellowship program.
Ocaasi told the Signpost that after the committee recommended funding his proposal, he was faced with a period of intricate questioning that challenged and/or focused on the weakest parts of his proposal. This "frank" discussion was something that he credited with keeping his expectations pragmatic and his budget conservative. On his project to open paywalled archives to Wikipedians, named the Wikipedia Library, Ocaasi said that the idea came from the news archives of HighBeam Research:
“ | My spark of inspiration came while researching an article on an alternative medicine figure. In a compulsive quest to exhaust the resources [available on them], I realized that I didn't have access to some news archives that HighBeam Reference did have. So I signed up for a free trial and added around 15 new references. I realized I'd regret not having the same ability in the future, so I had a wild idea to call up HighBeam and ask them for a free account as a Wikipedia editor. And, then the wheels started churning, and I thought, If I'm going to get one for me, why not ask for some more? HighBeam's response blew me away: "How about 1000?" That right there got me hooked—the ability of Wikipedia to open doors with other organizations in a way that can come around to benefit Wikipedians en masse. | ” |
These accounts are typically extremely expensive for the partner institutions; giving them away, especially with a medium-term goal of 2,000 accounts and a long-term goal of 10,000, is likely to represent a significant loss in revenue. What do they gain or ask from these agreements? Ocassi illuminated the reasoning behind such moves, saying that it has been "both altruistic and mutually beneficial". The altruism aspect is clear, as giving away free accounts to the dominate internet reference site furthers the information available to the world.
The mutually beneficial aspect is not so obvious. The site allows Wikipedians to discover and add information they may not have otherwise found. The donating institutions, on the other hand, "gain increased visibility of their site in our community through the account sign-up process, some positive publicity in blog mentions and the social media, and their site may be linked in article references." However, Ocassi told us that in the latter case, full bibliographic information needs to be used so that editors and readers are given a chance to find a free copy, should they not have access to the archive.
Where do the GLAM-Wiki movement and regular Wikipedians fit in with the "Wikipedia Library" plan? While Ocaasi told us he believes the Library and GLAM-Wiki are "natural partners", he said it is very different from a GLAM project in the traditional sense, since it is not about "having institutions freely license content or learn how to edit articles about their collections ... we're looking for material donations to proprietary databases and resources." As for Wikipedians, they could play a central role in forming the planned central website for Wikipedians to log in and access multiple archives at once: "In a later phase of the project it would likely be necessary to have a staff person with library information management expertise and/or an expert in security authorization (OpenID, SAML) to contribute. A drastically effective shortcut would be piggy-backing on an existing University Library's system so that we could gain access through that portal and not have to individually configure every donor ourselves." He asks that people contact him if they have a connection to someone like this at a university, research institution, or major public library.
Beyond that, he says what Wikipedians can do most to help is simple: sign up and use these resources. This will show potential partner institutions that that there is demand for such a project, a project whose final goal is far-reaching: "I want the most active Wikipedia editors to have free and full access to as many or even more resources than the finest research libraries and universities in the world."
The English Wikipedia's April Fool's Day main page was the subject of controversy this week, as editors opposed the addition of non-serious content.
As Wikipedia:April Fools notes, "every year [on 1 April], some editors decide to pull a few pranks on Wikipedia. It is traditional to have a mischievous Main Page on this day." "Mischievous" has ranged from blatant hoaxes (like the infamous announcement that the Encyclopædia Britannica was going to take over the Wikimedia Foundation and its projects), to the layouts seen today (typically strangely worded yet true statements), and the bizarre (like the gigantic question mark for the day's featured article).
Most of the main page's sections participate, but "In the news" (ITN) has been alone in rejecting it, only carrying one item in 2011 and none in 2012 or 2013. Opposition to including foolish-day-centric content at ITN included the time sensitivity of the regular, serious news. Opposition to having a foolish main page, however, coalesces around the usual serious nature of the Wikipedia site as a whole, as contrasted what was seen as the immaturity of April Fool's Day jokes. HiLo48 took a rather combative tone, decrying the changed nature of Wikipedia on the day: "The point is that we are not at all important. That's why everything here is sourced to someone or something else. Everything, that is, except the April Fool's garbage created by self-appointed fiction writers (otherwise known as editors). By creating April Fool's jokes you are declaring yourself to be important, and you're not." Those against such jokes made statements such as calling on Wikipedians to "not damage the WMF trademark, remembering that many native English-speakers could give a dump about April Fool's Day, and most non-natives don't know about it."
There was also a proposal to abandon any April Fool's jokes for 2014, with a chance to assess whether that practice should be continued after 1 April, but it was quickly opposed. Given a chance to expound on his views, the proposer declined to comment, saying that he had gone through enough vitriol in the aftermath of his proposal.
The Signpost talked with two editors who participated in April Fool's Day discussions. Crisco 1492, the author of the day's featured article (and the subject of a related interview in this week's "featured content") told us:
“ | From a purely practical standpoint, Wikipedia's editors are capital for the project. Like any business or organisation, without capital Wikipedia cannot grow and propagate. ... without capital (writers), the project will die. ... we should recognise that most people edit Wikipedia as a hobby, a way to pass the time, while still having a life outside of the encyclopedia. Like any hobby, the ultimately goal is to find a sense of pleasure, to find some fulfillment. ... Having an April Fool's page gives writers (and readers) something to look forward to, and lets them let steam free without resulting to personal attacks or vandalism. I read a comment by someone which really resonated with me: it's better to have a day of foolishness for fun than the three or four months of foolishness veiled as serious work. | ” |
He went on to express support for the "misleading, yet accurate" stories Wikipedia has run since 2007, and gave a four-point summary of his personal opinion:
On the issue of what makes Wikipedia different from other major online presences, like Google, who conduct elaborate April Fool's Day jokes, Allen3 told us:
“ | Wikipedia is a loose collection of volunteers while the other organizations generally have strict corporate control structures. At a newspaper, an editor can direct a team of individuals to work on a single coordinated effort. The newspaper's management can at the same time ensure that no other part of the organization disrupts this effort or attempts to engage in alternative April Fool's efforts. This level of cooperation and coordination is not possible on Wikipedia. If one person does not agree with a course of action there is little stopping them for branching out and starting a competing effort. | ” |
How much of the traditional humor, which is based around perceived dirty words (like this year's "Did you know... that Polish girls are getting wet and spanked today, but will have their revenge tomorrow?", could be improved to satisfy the complaints of some editors? Allen3's answer was complex: to get 'good' humor, one must provide incentives for it, like "preferential times and placements". Such humor, though, can be difficult to find; often "creating a quick article with a dirty word in the title" is far easier than crafting a "truth is stranger than fiction" article.
The Signpost invites readers' views on the talk page.
This case, brought by Lecen, involves several articles about former Argentinian president Juan Manuel de Rosas (1793–1877). An editor is accused of systematically skewing the articles, as well as Spanish language sources, in order to portray a brutal dictator as a democratic leader, in keeping with the political motives of Argentinian "nationalists" or "revisionists".
The arbitration committee, not being expert in Argentine history or fluent in the Spanish language, asks for any "uninvolved editors with subject-matter expertise" to participate in the evidence and workshop phases of the case, to help determine "whether the allegations of use of highly disreputable and unreliable sources, quotation of Spanish-language sources incompletely or out of context, and the like appear to have merit."
The evidence stage is scheduled to close 12 April 2013, and a proposed decision is scheduled for 26 April 2013, though these dates may be extended by the recent floods in Buenos Aires, which have adversely affected an editor involved in the case.
This case, brought by Mark Arsten, was opened over a dispute about transgenderism topics that began off-wiki. The evidence phase was scheduled to close March 7, 2013, with a proposed decision due to be posted by March 29.
This case was brought to the Committee by KillerChihuahua, who alleges the discussion over this American political group has degenerated into incivility. Evidence for the case was due by March 20, 2013, and a proposed decision scheduled for April 3, 2013.
As previewed in last week's "Technology Report", users of ten Wikipedias including Italian and Russian – in total accounting for some 10% of all visits to Wikimedia sites – this week got access to phase 2 of Wikidata following its first rollout to production wikis (Wikimedia Deutschland blog).
The primary focus of this second phase is the introduction of a new {{#property}}
parser function. The function retrieves a named property of a given Wikidata item (at time of writing, that item must be the one linked to the current page). Thus, using {{#property:p169}}
will retrieve the "CEO" property attached to the current page, if any. The team behind Wikidata reports that they are close to deploying the code necessary to allow editors to use the alternative syntax {{#property:chief executive officer}}
, as well as allowing them to retrieve properties of arbitrary items (the population of Paris on the article for the Eiffel Tower, for example).
Although the 27 March rollout initially appeared to be wholly a success, WMF Site Architect Asher Feldman quickly raised serious concerns about its impact on site performance. In particular, in a post to the WMF Operations mailing list, he judged two serious "jobqueue related" site outages on 28 March to be the fault, in part, of the ramping up of Wikidata. In both cases, Wikidata's change propagation mechanism had added large numbers of jobs to the jobqueue, a part of Wikipedia site maintenance widely acknowledged to be creaking around the edges. Under the strain, the under-performing job queue caused all WMF slave databases to lag, Feldman noted, ultimately causing the downtime for editors.
"The good thing is," Feldman added "the jobqueue was identified as a scaling bottleneck a while ago, and will be [upgraded] very soon." In the meantime, the Wikidata team report they are also working to limit the pressure Wikidata places on the jobqueue. They hope to avoid performance questions delaying the further rollout of Wikidata phase 2 to other client wikis (including the English Wikipedia) over the next month.
In related news, WMF Editor Engagement specialist Steven Walling gave his concerns about the Wikidata implementation currently being rolled out and, in particular, the difficulty new users will have in working out where property values can be changed (answer: the item page on wikidata.org). The problem might be solved in the short term with the addition of overt "[edit]" links and in the longer term via integration with the VisualEditor, it was suggested.
Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for several weeks.