Wikipedia:Wikipedia Signpost/2020-04-26/By the numbers

By the numbers

Open data and COVID-19: Wikipedia as an informational resource during the pandemic

Changwook Jung, Sun Geng, Meeyoung Cha are from the Institute for Basic Science, South Korea & KAIST. Inho Hong is from the Center for Humans & Machines, Max Planck Institute for Human Development, Germany.
Diego Saez-Trumper is a researcher employed by the WMF. This paper represents work beyond his regular duties. This article was originally published on "Medium". The text, but not the graphs, on "Medium" are licensed CC0

From the very start of COVID-19, when it was known just as an outbreak of an atypical pneumonia in China, people around the world have been finding and sharing information about the virus on Wikipedia, a frequent online resource for medical information. While the content and quality of the information on Wikipedia is shaped by volunteer editors (over 34K contributing to COVID-19 related pages) and by policies about verifiability, the activity generated by these volunteers and readers also generates a considerable amount of data itself. For example, we can explore how many Wikipedia articles have been created about COVID-19 related topics. Which sources are cited in those articles? How many people had reviewed such articles? Which are the most visited pages?

This post offers an overview of the COVID-19 related data generated in Wikipedia, highlighting the diversity of content that people read: from general information about the pandemic and regional responses, to the people who have been involved in the pandemic and misinformation about the virus. You can see some of this data in a new interactive resource, which will be updated regularly, from the Wikimedia Foundation. All the data used in this article is public and can be scrutinized, accessed, and used by third parties, using the MediaWiki API and other online resources offered by the Wikimedia Foundation. Sample source codes are made available at this Jupyter notebook.