Wikipedia:Wikipedia Signpost/2019-11-29/Special report

Special report

How many people edit in your favorite language? Where are they from?

Let's say you are interested in how many active editors from France are editing the English-language Wikipedia; or conversely, you'd like to know how many editors from the UK are editing the French-language Wikipedia. All the necessary information needed to calculate these numbers is recorded, at least temporarily, by the Wikimedia Foundation, but unless you worked for the WMF and had access to the Geoeditors Monthly database you could never find those numbers. The WMF did not wish to disclose this data out of concerns that the numbers were precise enough that governments or others could back out material that might lead to the identification of individual editors.

This month a new dataset was made public by the Wikimedia Foundation Geoeditors/Public, or more informally Active Editors by country. It allows the public to see, more or less, how many active editors (5–99 edits in a month) and very active editors (100+ edits) from about 180 individual countries contribute to active Wikipedia versions, each month from January 2019 onward. For example, if you wanted to know how many people editing from the UK made more than 99 edits to the French version of Wikipedia in September, you can look it up in this dataset. The answer is somewhere between 11 and 20.

Because of privacy concerns exact numbers are not given. Data from 30 countries are excluded, e.g. China, Kazakhstan, Russia, Saudi Arabia and Venezuela. Exact data on the number of editors in each category (editors from country x who edited Wikipedia version y) are not given. Rather these numbers are only given in “buckets” of ten: 1–10, 11–20, 21–30, 31–40, etc. Technical information is available here. The data are available here.

But enough for the preliminaries! What questions can the dataset answer that I’ve been dying to know the answer to? The following analysis is only the briefest overview of data from one month, September, quickly done. It’s not in any sense academic research, but hopefully will allow people to understand what type of data the dataset contains and what type of questions it can be used to address.

My main questions – of personal interest – are:

  • What countries contribute most to the English-language Wikipedia (enwiki)? Are they the richer, or the more populous English-speaking countries? Or perhaps those countries where English is widely spoken as a second language?
  • Do these relations differ across different Wikipedia language versions? Answering the above questions for the Spanish-language Wikipedia (eswiki) allows a simple comparison.
  • And finally, how do contributions across countries to different language versions compare. Edits from the US and UK are examined here.