Wikipedia:Wikipedia Signpost/Single/2020-04-26

The Signpost
Single-page Edition
WP:POST/1
26 April 2020

Featured content
Featured content returns
Arbitration report
Two difficult cases
WikiProject report
The Guild of Copy Editors
 

2020-04-26

Unbiased information from Ukraine's government?

Ministry of Foreign Affairs of Ukraine starts project

The logo of the Ukrainian Ministry of Foreign Affairs

The Ministry of Foreign Affairs of Ukraine launched an "Ambitious Campaign to Enrich Wikipedia with Unbiased Information on Ukraine and the World" on April 22, in cooperation with Wikimedia Ukraine. The initiative is aimed at reducing disinformation about Ukraine and spreading "unbiased facts about the country in various languages on Wikipedia". The campaign is set to begin with an online editing marathon - 'Month of Ukranian diplomacy' in May and will ask Wikipedia editors and Ukrainian diplomats to 'correct' and fill in information gaps about the nation. The ministry plans to publish data that can be incorporated into Wikipedia.

After Раммон posted the original MFA announcement to Jimbo Wales' talk page, users had mixed responses. Guy commented "Good news, well-informed Ukraininans to counter the GRU disinformation campaign", but Carrite considers the campaign tantamount to an "assault on NPOV".

The original announcement included surprising wording such as "mega campaign to saturate Wikipedia with unbiased information" in the title and "Ukrainian diplomats will also help to write Wikipedia's articles". The original also mentioned "Russian aggression against Ukraine".

The next day, following inquiries from The Signpost, the announcement, including its title, was restated. "Ukrainian diplomats will also organize campaigns for writing Wikipedia's articles in languages of different countries to give the international community a better understanding about Ukraine," replacing "help to write" with "organize". The sentence containing "Russian aggression" was left unchanged.

Anton Protsiuk, Project Manager for Wikimedia Ukraine, replied to The Signpost that the chapter "will not directly work with diplomats to edit Wikipedia. But we cannot prohibit Ukrainian diplomats from editing Wikipedia on their own, so we want to explain to them how Wikipedia works, what are the relevant policies, including copyright and NPOV. In other words, we will do everything so Ukrainian diplomats do not push a Ukrainian point-of-view in Wikipedia and understand the nature and rules of Wikipedia."

The chapter will conduct online webinars for the diplomats so that they will understand Wikipedia's rules. The diplomats will then publish information useful to Wikipedia editors on the MFA website, which is now freely licensed.

This is not the Ukrainian government's first such campaign. Notably, KyivNotKiev is a campaign begun on October 2, 2018 to get English-language media to spell the name of its capital "Kyiv", which is transliterated from the Ukrainian language, rather than "Kiev", which is transliterated from the Russian language. As part of a larger "CorrectUA" campaign, the government has seen some success, with several large news organizations changing their usage, including the BBC, the Associated Press, The Wall Street Journal, The Economist, and The New York Times.

Russian reaction to the MFA's initiative, as shown by two reports from the wire service RIA Novosti, has been critical. The first report notes that "relations between Moscow and Kiev deteriorated after the coup in Ukraine, the reunification of the Crimea with Russia and the beginning of the confrontation in the Donbass." The second report speculates that the project will be used to whitewash the history of Ukrainian nationalists who fought against the Soviet Union during World War II. RIA Novosti labeled these nationalists as "active accomplices of the German Nazis."

Access to information during the pandemic

Since the beginning of the 2019–20 coronavirus pandemic, various information-sharing organizations have responded by making some (or all) of their content freely available. JSTOR announced that they were dramatically increasing the content available to 'participating institutions', expanding the free-article limit for registered users from six to one-hundred articles a month and making over 6,000 articles related to the disease free through June 30, 2020. Project MUSE made various resources free to the public (list here), including over 25,000 books and 300 journals.

Hathitrust announced a similar service on April 22, giving member libraries who have disruptions to their services the ability to access materials from Hathitrust that match physical copies the libraries hold. The Internet Archive announced it would would modify its controlled digital lending to lift check-out limits for 1.4 million non-public domain books in its Open Library, becoming a "National Emergency Library" through June 30 or later.

Various authors and writer advocacy groups slammed the Internet Archive, calling its decision "an excuse for piracy". The Authors Guild stated it was "shocked that the IA would use the COVID-19 epidemic as an excuse to push copyright law further out to the edges, and in doing so, harm authors, many of whom are already struggling".[1]

A very big long-term project

There are more than 3,000,000 stub articles on the English Wikipedia. That's right – more than half of enWiki's articles are stubs. The 50,000 Destubbing Challenge is taking the bull by the horns. Encyclopædius helped start the project, and 32 contributors are signed up on the main project page as of April 26. They've destubbed 1,105 articles as of April 21, having started in March with The Great Britain and Ireland Destubathon. The goal is to destub 5,000 articles per year for ten years and ideally move on to a 1 million Destubbing Challenge.

Wikipedia Weekly returns

WikiSeder on Wikipedia Weekly

The Wikipedia Weekly podcast was dormant for several years, but has been revived in the current pandemic as the livestreamed Wikipedia Weekly Network. One of the first episodes was an online, non-religious WikiSeder, adapting the Jewish tradition remembering the plagues and liberation to a celebration of wiki wisdom in the age of the quarantini. This is best enjoyed if you play along and toast the "Four Cups to Free Knowledge" together at home—that will surely prepare you for the song at the end. WWN is now podcasting several times each week, and hopes to release summaries of highlighted episodes in The Signpost as a monthly column.

Brief notes

The mainpage hooks included President Clinton's hankering for bacon butties, Volkswagen's ketchup lubricated part, Captain Kirk's computer encryption skills, and sexy Pepsi cans.

References



Reader comments

2020-04-26

Coronavirus, again and again

Following a February 9 article by Omer Benjakob, a flood of news articles in March praised Wikipedia's coverage of all things related to coronavirus. This month the flood slowed down, but is showing signs of resuming.

More coronavirus news

  • Why Wikipedia Is Immune to Coronavirus in Haaretz by Omer Benjakob following up his February 9 story. With lock-downs around the world and almost everybody with internet access actively browsing, the internet has been stressed with an 'infodemic' of misinformation. Lacking the resources of YouTube, Google, Twitter and Facebook, Wikipedia is nonetheless "having its moment," with 115 million pageviews of coronavirus related articles on the English language encyclopedia this year through April 7. The role "of being the public’s main source of medical and health information" has been thrust upon Wikipedia. The role of WikiProject Medicine and its tough standards is emphasized and how it has been "immunized" by dealing with previous public-health scares like the 2003 SARS and the 2015 Zika outbreaks.
  • Why Wikipedia is winning against the coronavirus 'infodemic' The Telegraph interviews the "chief steward of the greatest collection of knowledge in the entire history of human civilisation", Katherine Maher, aka the ED and CEO of the Wikimedia Foundation. Maher makes the point "the committed, meticulous and sometimes eccentric community of volunteer editors" are the actual bosses of the encyclopedia, not her. Using examples from the current pandemic, she explains how Wikipedia works and how the new traffic records stress the site. "It's a good thing that Wikipedia works in practice, because it would never work in theory," she says. "It works because ... people want it to work?" That may be the best explanation we'll ever get.
She concedes that there is evidence of state-sponsored campaigns on Wikipedia, for example on the Chinese Wikipedia, and that the WMF is watching a few possible cases. A bigger fear, though, is that large areas of the encyclopedia could be captured by ideologically-driven communities.

Rosie

Wikipedia is a world built by and for men. Rosie Stephenson-Goodknight is changing that in The Lily (Washington Post)). You might think you know about Rosiestep but you will learn much more by reading this article, She was born in Gary, Indiana. While growing up in California, she wanted to be an anthropologist, but bowed to her father's wishes and majored in business administration, then became a healthcare administrator. She first edited Wikipedia in 2007, creating an article on the defunct publisher Book League of America. She's created articles on the Kallawaya, Perry River, Donna, and her grandmother. You likely know about her work at Women in Red, reducing Wikipedia's gender gap, and her writing of the article Maria Elise Turner Lauder, which was recognized as the English-language Wikipedia's sixth millionth article, but the beauty of this Lilly article is in the details.

Jew-Tagging

Jew-Tagging @Wikipedia by Edward Kosner in Commentary. Kosner who describes himself as "a proud if non-observant Jew" thought it was intrusive that the Wikipedia article about him described him as being "born to a Jewish family." Neither he, nor his son, could remove the offending text. But when he responded to a Wikipedia solicitation for a donation commenting that he'd "be much more inclined to contribute had Wikipedia made it possible to deal with my problem" - perhaps coincidentally - he received an answer from Coffee. The story gets complicated from here. There are different reasons why an article subject might want to be, or not want to be, identified by their religion or ethnic group. There are different reasons why an editor might want to identify an article subject by their religion or ethnicity. Several editors said on the Jimbo Wales talkpage that they were offended by the implication that a refusal to donate could result in the changing of article content.

In brief

  • A small town newspaper gives good advice on determining news reliability: The Kokomo Perspective suggests using the SIFT method. The acronym is straightforward "Stop. Investigate the source. Find better coverage. Trace back to origins." Under "Investigate the source" they note that "nearly every English language publication or media website has a Wikipedia site, which will summarize it." For most reliable sources, and some unreliable sources, this is correct.
  • NoFap struggles against Wikipedia, accuses editors of bias in Reclaim the Net: NoFap, a self-help website and community forum aimed at curbing pornography viewing, has taken offense to the way it is described on Wikipedia. NoFap feels that "activist" editors and porn industry personnel have distorted the relevant article to give an inaccurate representation of its purpose.



Do you want to contribute to "In the media" by writing a story or even just an "in brief" item? Edit next month's edition in the Newsroom or leave a tip on the suggestions page.




Reader comments

2020-04-26

Redesigning Wikipedia, bit by bit

Redesigning the left sidebar

Wikipedia’s design has changed very little in the last ten years.[1][2][3] For example, the current Vector skin was introduced in 2010 (although some changes are currently being planned—more on that below), the Main Page has had basically the same layout since 2006, and that sidebar on the left-hand side of the page with all the links is almost as old. A recent WMF-funded report concluded in part that the sidebar was one of the most confusing parts of Wikipedia's design for casual readers:

Readers were unable to understand the purpose of the Menu on the left hand side of the site, noting in particular that they did not understand the items in the menu (e.g. Related changes, Special Pages). They felt that it was not relevant for them.

Editors have tried to improve the sidebar over the last decade. A 2013 request for comments (RfC) on the sidebar noted that:

  1. Even compared to other pieces of site-wide navigation, the sidebar is an extremely important navigation tool. With the vast majority of readers and editors using a skin (Vector or Monobook) with the sidebar placed on the left, it is in a natural position of importance considering English speakers tend to scan left to right.
  2. The sidebar is currently cluttered. On the Main Page, English Wikipedia readers see 22 linksin 2020, it's 21, not including language linksor "In other projects" links. Basic usability principles tell us that more choices increases the amount of time users have to spend understanding navigation (see Hick's law), and that simplicity and clarity are worthwhile goals. The most recent design of the homepage of Google.com, famous for its simplicity, has half the number of links, for comparison. While removing some semi-redundant links (like Contents or Featured contents) would be preferable, if we're going to have this many links it means prioritization is key, leading to the next point...
  3. The sidebar has poor prioritization. Users read top to bottom, and it is not unfair to say that the vertical order of the links should reflect some basic priority. However, currently, this prioritization is sloppily done. Even if we assume all the current links are important and should stay, the order needs work.
  4. The names for some links are overly verbose or unclear. Brevity is the soul of wit, and of good Web usability. We should not use two or three words where one will do.

The RfC proposed a new design for the sidebar which featured several collapsible sections. Ultimately, there was no consensus to make the change. After an incubation period at the Village Pump Idea Lab, the RfC was formally introduced (and advertised) this month. This year's RfC page has a different format: rather than one proposal to completely change the sidebar, many smaller proposals have been made to add or remove links. There are too many proposals to discuss each one, but here's the table of contents:

   1 Background
   2 Reorderings
       2.1 Reorder the links in the left sidebar to create a new "contribute" section
       2.2 Move Wikidata to "In other projects"
       2.3 Move "In other projects" under "Print/export"
       2.4 Separate "Page tools" and "User tools"
       2.5 Move "Print/export" above "Tools"
   3 Renamings
       3.1 Donate to Wikipedia → Donate
       3.2 Wikipedia store → Merchandise
       3.3 About Wikipedia → About
       3.4 Contact page → Contact
       3.5 Main page → Main Page
       3.6 Logs → Logged actions
       3.7 Languages → In other languages
       3.8 User rights management → Manage user rights
       3.9 Tools section → ???
       3.10 Print/export → Export
       3.11 Mute preferences → Mute this user
       3.12 Printable version → Print
       3.13 Download as PDF → Save as PDF
   4 Additions
       4.1 An introduction to contributing page
       4.2 An FAQ page
       4.3 A dashboard
       4.4 Logs
       4.5 Deleted contributions
       4.6 Search page
   5 Removals
       5.1 Featured content
       5.2 Upload file
       5.3 Permanent link
       5.4 Wikipedia store
       5.5 Print/export (both "Download as PDF" and "Printable version")
       5.6 Random Article
       5.7 Recent changes
   6 Autocollapsing
       6.1 "Tools" section
   7 Changing tooltips
       7.1 Comprehensive overview
       7.2 Featured content tooltip
       7.3 Current events tooltip
       7.4 Random article tooltip
       7.5 Donate to Wikipedia tooltip
       7.6 Wikipedia store tooltip
       7.7 About Wikipedia tooltip
       7.8 Community portal tooltip
       7.9 Recent changes tooltip
       7.10 Contact page tooltip
       7.11 Upload file tooltip
       7.12 Special pages tooltip
       7.13 Permanent link tooltip
       7.14 Page information tooltip
       7.15 Wikidata item tooltip
       7.16 Logs tooltip
       7.17 View user groups tooltip
       7.18 Mute preferences tooltip
       7.19 Download as PDF tooltip
   8 A Customizable Sidebar (or an Advanced Mode)
   9 A note on power users, usability, and systemic bias
   10 Technical underpinnings
       10.1 Sidebar settings page
       10.2 Tools
   11 Comparison to other projects and languages
The Wikimedia Foundation's proposed Vector update in action

While the English Wikipedia community works to make the sidebar better for readers and editors, the Wikimedia Foundation (WMF) has been proceeding with its own plans to update the Vector skin. Current plans concern two proposed changes that would be incorporated into the standard Vector skin: moving the language selector to the top right of the page, and collapsing the sidebar by default. The WMF is currently accepting feedback on the changes here. P

April Fools' Day: have we gone too far with the pranks?

This year's April Fools' Day festivities were some of the more chaotic ones in recent memory. Articles for Deletion had a record 93 nominations (not counting some deleted coronavirus-related ones), which was about double the amount of serious nominations that day. Editing the page meant to document all the pranks was near-impossible due to edit conflicts caused by 1) yet another edit war over the title of the "other pranks"/"general tomfoolery", and 2) outright vandalism of the page (typical example here), though this one became less of a problem after the page was semi-protected. By the end of the day, it was a foregone conclusion that there would be an RfC to bring about some changes to how Wikipedia celebrates April 1. (Similar RfCs took place in 2013 and 2016.) Notable proposals on the RfC page include:

  • Requiring the customary joke "did you know..." entries to be labelled as jokes)
  • Clarifying that vandalism of the year's prank log page is prohibited
  • Banning joke deletion nominations

P

In brief

  • On Wikipedia talk:Non-free content, eliminating or restricting vector graphics has been proposed as a necessary step to comply with the fair-use requirement of minimal infringement of the owner's rights, which in Wikipedia's compliance means low-resolution graphics. This is technically impossible with vector graphics, as they are inherently resolution independent. B
  • On Wikipedia talk:Manual of Style/Accessibility: Should all tables be required to have captions? This change was proposed with the goal of making Wikipedia more accessible to blind readers who use screen readers, though some users see making this a bright-line rule to be overkill. P
  • On current events: discussion at WikiProject COVID-19 (the subject of last issue's WikiProject report) that is contemplating removal of the current-event template {{Current}} at the top of some articles. B
  • On the policy village pump: Should a "Main Page Editor" usergroup be created with the editprotected and protect (at least until editcascadeprotected gets implemented) rights? This would allow non-admins to edit the main page in order to fix errors. P
  • Also on the policy village pump: Proposed updates to Wikipedia's policy on bureaucrat activity, to remove the provision that allows bureaucrats to keep their tools if they express willingness to participate in bureaucrat activities but don't actually use the tools. P

Follow-ups



Reader comments

2020-04-26

Featured content returns

Johnny Cash reached number one in 1976 with the song "One Piece at a Time", about an assembly line worker who constructs his own car using stolen parts from a variety of different models. After the song became successful, an auto shop built the car described in the lyrics.
This Signpost "Featured content" report covers material promoted from April 01 through April 23. Text may be adapted from the respective articles and lists; see their page histories for attribution.

Featured Content is back, and here to stay! The editors of The Signpost regret that the past year and the first few months of this year were not covered. Please review the archives of Goings-on or various other logs to see that content.

Seven featured articles were promoted this month.

Stars of Ghostbusters II, (l–r, top row) Bill Murray, Dan Aykroyd, Sigourney Weaver, (bottom row) Ernie Hudson, Harold Ramis, and Rick Moranis
Cyclone Chapala at peak intensity on 30 October
George Washington (John Trumbull, 1780), with William Lee, Washington's enslaved personal servant
David Gower and Ian Botham, pictured batting against Australia at Trent Bridge in 1981, were England's leading run-scorers in Test cricket between 1975 and 1989.
A football team celebrating and lifting a trophy.
Cardiff City manager Neil Warnock (holding trophy right) and Sean Morrison (left) lift the 2017–18 EFL Championship runner-up trophy
Restored train car used to transport Slovak Jews..
Bluebells in Goulding's Wood, part of Park Wood and Goulding's Wood LNR.
Eddie Guardado, a pitcher for the 1993 Xpress.

Eighteen featured lists were promoted this month:



Reader comments

2020-04-26

Two difficult cases

Two very difficult cases were heard in April, with one ongoing. Jytdog, a productive and controversial editor, was indefinitely banned on April 13. The Medicine case opened on April 7. Two frequent Signpost contributors are involved. No Signpost staffer is available to write this article who considers themselves to be unbiased about these cases. This writer has strong, and mixed, feelings on both cases, and will keep the descriptions short.

Jytdog

Jytdog was indefinitely banned on April 13 by a vote of 11 arbitrators to 1. He may appeal the ban in 12 months. The case was originally started two years ago and closed soon after when Jytdog resigned as an editor, stating that he would never return. After he expressed a desire to return as an editor in March, the case was reopened.

Much of the case revolved around Jytdog's efforts to fight paid or conflict-of-interest editing. A key aspect of the case involved his uninvited telephone contact with another editor. Strong evidence was presented that Jytdog repeatedly badgered other editors.

Medicine

A long term dispute at WikiProject Medicine that came to a head over drug pricing information in articles was taken to ArbCom and the case opened April 7.

On the evidence page RexxS states "the vast majority of parties to this case are respected, long-term editors who have made considerable contributions to the field of medicine on Wikipedia over many years. It should be taken as a given that every single party's foremost aim is to improve Wikipedia, although there exists a wide range of opinion on how that is best achieved."

The parties include Doc James, a long-time contributor to The Signpost and a member of the WMF Board of Trustees. Bluerasberry, another long-time contributor to The Signpost joins several other editors in favoring the inclusion of drug prices in medical article. Sandy Georgia and several other medical editors are concerned about multiple long-term trends affecting the Medicine Project.

At the evidence page, editors are roughly split, in the type of evidence they have presented, in whether it favors one side or the other in the dispute. In the Workshop phase, which ends May 5, many of the proposals appear to favor letting the editors solve the content dispute on their own. A proposed decision is expected by May 12.




Reader comments

2020-04-26

Disease the Rhythm of the Night

This traffic report is adapted from the Top 25 Report, prepared with commentary by Igordebraga, Rebestalic, (March 22 to April 18) Soulbust (March 22 to April 4), A lad insane (April 5 to 11), and Berrely (April 12 to 18)


Well, I told you once and I told you twice, the virus spread because people didn't take advice. The variety of topics is not enough to please me, but writing this Report at least should be easy.

(data provided by the provisional Top 1000)

You could put a mask upon my face (March 22 to 28, 2020)

Most Popular Wikipedia Articles of the Week (March 22 to 28, 2020)
Rank Article Class Views Image Notes/about
1 2019–20 coronavirus pandemic 7,865,205
The little spiky sphere that you see on the left is a computer illustration of something that is measured in millionths of a centimetre. That's not a big deal. The illustrated virus is tiny. But that little spiky sphere has sickened nearly nine hundred thousand people and killed more than forty thousand. It has sent stocks reeling, swept shop-shelves empty through panic-buying and locked down whole countries and their economies. The severe acute respiratory syndrome coronavirus 2 causes coronavirus disease 2019 (COVID-19), which is responsible for the 2019–20 coronavirus pandemic and for all the illness, death and other consequences.
2 2020 coronavirus pandemic in India 3,327,313
India is becoming increasingly affected by the global coronavirus pandemic, numbering 1,998 cases at the time of writing. The country, which is home to over a sixth of the world population, is currently under a lockdown that Prime Minister Narendra Modi declared on the 24th March. 41 have sadly passed away from infection and 144 have recovered.
3 Coronavirus 2,692,771
For a while, the Latin word for crown could mean a star's surrounding fire, a Mexican beer, a shower brand, the group that sung "The Rhythm of the Night"... but lately if you say "Corona", we know it's the virus type which has an instance spreading itself while locking us at home.
4 Spanish flu 2,403,781
The sheer scale of the current pandemic has earned it comparisons to the influenza-fuelled Spanish Flu, which lasted from January 1918 to December 1920. The number of victims of the pandemic, as estimated in 2006, was 500 million—a quarter of the population of the entire world at the time, and a little higher than the combined populations of the US and Russia today.
5 2020 coronavirus pandemic in the United States 2,178,259
For some variety, let's borrow from Reddit learning The Offspring might be to blame if Chile sees a sudden rise in COVID-19 cases:

Like the latest fashion.
It's a spreading disease.
The kids are coughing all the way to the classroom
Boomers infected with a breath and a wheeze.
Republicans say that it’s all overblown.
The Dems say it’s a crisis and you’ve gotta stay home.
If one guy cough up and the others just breathe.
They're gonna pass it on, pass it on, pass it on, pass it on...
Hey! Man you're talking close to me?
Take him out
YOU GOTTA KEEP’EM SEPARATED!
Hey! Man, you want to infect me?
Cover your mouth!
YOU GOTTA KEEP’EM SEPARATED!
Hey HEY hey, we paid no mind!
We're under thirty we'll be infected in no time!
We can't come out and play!

6 2020 coronavirus pandemic in Italy 2,173,809
7 Coronavirus disease 2019 2,153,131
8 2019–20 coronavirus pandemic by country and territory 2,091,311
9 Madam C. J. Walker 1,971,771
Madam C. J. Walker was the first female self-made millionaire in the United States. Born in the Deep South that was once home to many slaves, Walker eventually launched the Madam C. J. Walker Manufacturing Company and propelled herself into the world of beauty and cosmetics. She died in former Union territory in Irvington, New York. Walker has recently been the subject of Netflix miniseries Self Made; her role is acted out by one of my favourite actors, Octavia Spencer. That's her on the left.
10 Orthohantavirus 1,762,835
As one guy dies on a bus, everyone goes crazy.

Oh the street is now an empty place (March 29 to April 4, 2020)

Most Popular Wikipedia Articles of the Week (March 29 to April 4, 2020)
Rank Article Class Views Image Notes/about
1 2019–20 coronavirus pandemic 5,593,988 It's terrifying how something like the Severe acute respiratory syndrome coronavirus 2 can damage the very biological functions of a person. Want it more terrifying? Okay, let's make it Terrifyingly Terrifying to a Terrifying Extent of Terrifyingly Terrific Proportions. To do that, make that same coronavirus damage the very biological functions of more than one million people, immobilise whole countries, send top-grade economies spinning, make two nurses in Italy commit suicide from stress, make people repeatedly clear whole shelves of supermarkets from panic buying, make the manufacturer of Corona beer lose 165 million US dollars, and kill the equivalent of all – that's all – the players in the entire NFL, NBA, MLB, NHL, La Liga and Bundesliga...

...fifteen times.

2 Joe Exotic 4,076,724[a]
'I am broke as shit, I have a judgement against me from some bitch down here in Florida, and this is all paid for by the Committee of Joe Exotic Speaks for America.'

Joseph Allen Maldonado-Passage, better known as Joe Exotic, is a former zoo operator and criminal. Exotic, who refuses to wear a suit, was convicted on 19 criminal charges in 2019, split across things such as animal abuse and hiring to kill (specifically, Carole Baskin, owner of a big cat rescue facility and the above mentioned "bitch" who he was forced to pay $1 million in damages). Aside from abusing animals and hiring hitmen, events of the (self-claimed) best tiger breeder in the US include being raped at the age of five (okay, that's really sad regardless of criminal status), contending for the 2016 United States presidential election, trying out at the 2018 Oklahoma gubernatorial election and featuring in Netflix's documentary Tiger King: Murder, Mayhem and Madness, which takes the third spot in this list.

3 Tiger King: Murder, Mayhem and Madness 2,460,831
4 Carole Baskin 2,341,891
5 2019–20 coronavirus pandemic by country and territory 2,306,585
Currently, India has a bit above 5,000 cases of COVID-19, the most affected state being Maharashtra, home of Mumbai (or, if you like, Bombay) among other cities. Meanwhile, the land of the Star-Spangled Banner has just passed a whopping four hundred thousand cases, with daily case increases consistently reaching above 20,000. New York is the hardest-hit state—and about that, see Andrew Cuomo (#11 on this list) for more details.
6 2020 coronavirus pandemic in India 2,149,064
7 2020 coronavirus pandemic in the United States 1,906,786
8 Spanish flu 1,890,887
Unfortunately, we don't know that much about the real stats of the Spanish Flu pandemic, as results were censored by a countries such as Germany, the UK (at the time including Ireland), France and the United States, who were in the war effort. What we do know, though, is that confirmed cases could have rocketed to half a billion, with anywhere from 17 to 50 million deaths. A note that some of the devastation of the Spanish Flu was due to resulting bacterial infections caused by things such as overcrowding and poor hygiene.
9 Coronavirus disease 2019 1,454,002
Okay, some clarification:

Coronavirus disease 2019 is the thing that's also called COVID-19. It's the illness that's taking world media by storm. The illness itself is caused by the immune system's response to the Severe acute respiratory syndrome coronavirus 2, also known as SARS-CoV2. The 2019-20 coronavirus pandemic is the ongoing outbreak of COVID-19 which is caused by SARS-CoV2. SARS-CoV2 is one of many viruses that fall under the coronavirus category – a coronavirus is simply a virus that has little spikes called peplomers on it. There are many coronaviruses, not just a few.

10 Coronavirus 1,338,199
  1. ^ combined page views total for "Joe Exotic" (2,739,317) and "Joseph Maldonado-Passage" (1,337,407), the latter of which is now a redirect due to a page move during the week (on March 29).

Quarantine reached month two, hope it's not here to stay (April 5 to 11, 2020)

Most Popular Wikipedia Articles of the Week (April 5 to 11, 2020)
Rank Article Class Views Image Notes/about
1 2019–20 coronavirus pandemic 4,126,190 This week, the current pandemic's cases exceeded the total number of recorded cases in the 2009 swine flu pandemic, which is not in this week's list (after strong showings for the quite a few of the preceding weeks). Indeed, worldwide cases are about to blast through two million. For me, the scale of this thing is becoming too much to comprehend. How in the world do you visualise two million of anything?
2 Joe Exotic 2,264,454
In all honesty, all this buzz about Joe Exotic (caused by the documentary Tiger King: Murder, Mayhem and Madness) is making Joe Exotic not at all that exotic.
3 2019–20 coronavirus pandemic by country and territory 2,199,766
According to the Worldometers coronavirus tracker, COVID-19 has smeared itself on about 210 unique countries and territories. Now, a clarifier that 'Territories' is used amongst 'Countries' as places such as Taiwan that aren't fully recognised as an independent country (sorry for the politics).
4 Boris Johnson 2,002,392
COVID-19 (as you may have noticed) is well-represented on this list. Many articles it brings into public view – such as #s 1, 3, 14, 15, 18, you get the idea) are rather obviously related to this virus. Some, however, require a cursory following of the news to connect, although one must have been living with one of the uncontacted tribes of the Amazon to not be hearing any news these days. Boris Johnson, the Prime Minister of the United Kingdom (and I'll fully admit I'm not entirely sure what his duties are) was diagnosed with COVID-19 several days ago and landed in the intensive care unit for a spell – he has though, fortunately for him, now been released from the hospital.
5 WrestleMania 36 1,882,062
2020 saw WWE's 36th annual WrestleMania, which was held with no live audience (due to #1). This particular Granddaddy of Them All consisted of two parts; the first won by The Undertaker and the second won by Brock Lesnar.
6 Carole Baskin 1,765,033
Another person featured on Tiger King, namely the woman our #2 tried to get murdered.
7 Money Heist 1,507,486
Who doesn't want money?!? And this Spanish show, which became an international sensation once picked up by Netflix, deals with people who decide to take it straight from the source! At least in the first season, don't know what is happening in the fourth which hit the streaming service.
8 2020 coronavirus pandemic in the United States 1,468,892
Total COVID-19 cases in the Land of the Star-Spangled Banner are almost at an almighty 600,000. However, the daily increase data offered at Johns Hopkins University's very detailed, map-based tracker shows that the best is coming for the US – a steep curve has been turned, and daily increases are now starting to fall. Best of luck to all Americans and don't forget to stay safe.
9 2020 coronavirus pandemic in India 1,438,743
Best of luck to all Indians as well. It's an absolute miracle that India, with its vast population, has managed to keep infection numbers so low compared to its demographics. It has been estimated that without the ongoing lockdown, cases in Bhārat Gaṇarājya could have surged by 35,000.
10 John Prine 1,363,120 In any normal time, this American country-folk legend would have topped this list upon his death last week at the age of 73. Unfortunately, these are not normal times, as the rest of this list indicates. Prine made last week's list (albeit lower, at #22) upon the news that he had entered the intensive care unit on March 26 with symptoms of COVID-19, and sadly succumbed to the virus on April 7, just months after being selected for a Lifetime Achievement Award. Perhaps I'll have to send him an angel from Montgomery.

This is the Reebok or the Nike?(April 12 to 18, 2020)

Most Popular Wikipedia Articles of the Week (April 12 to 18, 2020)
Rank Article Class Views Image Notes/about
1 2019–20 coronavirus pandemic 3,255,130 Believe it or not, it's been about 191/2 weeks (137 days) since the first reported cases of COVID-19 emerged in Wuhan. As of 23 April, there are approx. 2.7 million cases! The number already was and has now become even more frightening. To give you an idea, here is a website showing dots. And that only goes up to 1 million! Sadly, we still haven't been able to flatten the curve, but in a lot of countries, the curve is slowly starting to decline.
2 Joe Exotic 1,285,515
Tiger King, burning bright, on the flatscreens of the night, as viewers on the docuseries are still numerous enough to boost views on its primary subject, this weirdo who along with operating a big cat private zoo also indulged in rapping, politics and attempted murder.
3 Charles Ingram 1,260,050
Ingram was already well known for cheating on Who Wants to Be a Millionaire? to win the top prize using a clever series of coughs. There has been a lot of attention on him currently following the release of ITV show Quiz. The show is based on the award winning play of the same name, following Ingram's earlier life.
4 2020 coronavirus pandemic in the United States 1,200,164
The US currently has the most coronavirus cases in the world; so the article's high ranking is hardly surprising. One of the worst hit areas is the city of Detroit, with over 8,000 cases.
5 Deaths in 2020 1,118,629
I don't wanna be buried
In a Pet Sematary
I don't want to live my life again!
6 2020 coronavirus pandemic in India 1,108,561 Unfortunately for India, the population of Indians who have been infected with SARS-CoV2 is starting to balloon towards the 20K mark – now, that may not seem that much compared to the US's almost 900 thousand, but that's still a lot. Thank goodness for the lockdown that's currently in force – it has been predicted that if the current lockdown never happened, the amount of Indian COVID-19 cases could have been in excess of 30 thousand.
7 2019–20 coronavirus pandemic by country and territory 1,077,954 Almost all of the world's countries and territories have experienced some cases of COVID-19 within their borders – it isn't called a pandemic for nothing. In total, over two million people have been infected by SARS-CoV-2.
8 Money Heist 1,071,411 Money Heist, known in its native Spanish as La Casa de Papel (the House of Paper – i.e. paper money, printed in places like the Royal Spanish Mint to the left), is a Spanish crime drama concerning a Professor and his collaborators. It is the most-watched non-English language series to date and also quite an awarded one, having won a mammoth sixteen awards at time of writing.
9 Brian Dennehy 1,057,388 Brian Dennehy was an American actor, appearing on stage, on air and in the movies. You might know him as the father of Romeo in Romeo + Juliet as well as a mainstay at the Goodman Theatre in Chicago. In his lifetime of 81 years, he won two Tony Awards (an award for stage acting in the US), an Olivier Award (likewise but in British contexts) and a Golden Globe (film acting). Dennehy passed away of sepsis-induced cardiac arrest.
10 Carole Baskin 999,748
As you can see, Big Cat Rescue owner Carole Baskin amassed a cool 999 thousand (rounded down of course) views this week. This is because our #2 tried to kill her (the culmination of a feud that included Joe Exotic accusing Baskin of murdering her disappeared husband, and Baskin winning a trademark infringement lawsuit because Exotic decided to copy the branding of her sanctuary in the website of his tiger zoo).

Exclusions

  • These lists exclude the Wikipedia main page, non-article pages (such as redlinks), and anomalous entries (such as DDoS attacks or likely automated views). Since mobile view data became available to the Report in October 2014, we exclude articles that have almost no mobile views (5–6% or less) or almost all mobile views (94–95% or more) because they are very likely to be automated views based on our experience and research of the issue. Please feel free to discuss any removal on the Top 25 Report talk page if you wish.



Reader comments

2020-04-26

Roy is doing fine and sending more photos

Everybody has their own story about the coronavirus pandemic. Everybody has somebody that they've been worried about. This story is about photographer Roy Klotz who has uploaded 5,651 photos to Commons over the last eight years. The photos cover sites on the National Register of Historic Places, other historic buildings, and places on his many world travels. Some of my favorite photos taken by Roy follow, in no particular order.

The octogenarian started his fifth around-the-world ocean cruise in early January from Florida. Until recently, the last photos he uploaded were from the Caribbean.

By late March, the ship was off Australia and ports were closing due to the coronavirus pandemic. They managed to dock in Perth, but catching a flight home was another matter. Borders within Australia were closing, flights were disrupted, and Roy had another adventure getting back to the United States. He's still not quite home, but is staying with a relative. He's put her to work driving him around to take a few photos.

Roy began taking photos before World War II using his mother's Kodak bellows camera when he was 8 years old. Since then he's photographed with a Zeiss Contax, Argos, Nikon, and Canon cameras. He currently uses a Nikon D3. Over the years, he's photographed in 218 countries, all 50 U.S. states, and every Canadian province.

Welcome back Roy!




Reader comments

2020-04-26

Trending topics across languages; auto-detecting bias


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.


Reviewed by Isaac Johnson

"What is Trending on Wikipedia? Capturing Trends and Language Biases Across Wikipedia Editions" by Volodymyr Miz, Joëlle Hanna, Nicolas Aspert, Benjamin Ricaud, and Pierre Vandergheynst of EPFL, published at WikiWorkshop as part of The Web Conference 2020, examines what topics trend on Wikipedia (i.e. attract high numbers of pageviews) and how these trending topics vary by language.[1] Specifically, the authors study aggregate pageview data from September - December of 2018 for English, French, and Russian Wikipedia. In the paper, trending topics are defined as clusters of articles that are linked together and all receive a spike in pageviews over a given period of time. Eight high-level topics are identified that encapsulate most of the trending articles (football, sports other than football, politics, movies, music, conflicts, religion, science, and video games). Articles are mapped to these high-level topics through a classifier trained over article extracts in which the labeled data comes from a set of articles that were labeled via heuristics such as the phrase "(album)" being in the article title indicating music.

The authors find a number of topics that span language communities in popularity, as well as topics that are much more locally popular (e.g., specific to the United States or France or Russia). Singular events (e.g., a hurricane that has a specific Wikipedia article) often lead to tens of related pages (e.g., about past hurricanes or scientific descriptions) receiving correlated spikes. This is a trend that has been especially apparent with the current pandemic, as pages adjacent to main pandemic such as social distancing, past pandemics, or regions around the world have also received high spikes in traffic. They discuss how these trending topics relate to the motivations of Wikipedia readers, geography, culture, and artifacts such as featured articles or Google doodles.

It is always exciting to see work that explicitly compares language editions of Wikipedia. Highlighting these similarities and differences as well as developing methods to study Wikipedia across languages are valuable contributions. While it is interesting to explore differences in interest across languages, these types of analyses can also help recommend what types of articles are valuable to be translated into a given language and will hopefully be further developed with some of these applications in mind. The authors identify that Wikidata shows promise in improving their approach to labeling articles with topics. It should be noted that Wikimedia has also recently developed approaches to identifying the topics associated with an article that have greater coverage (i.e. ~60 topics instead of 8) and are based on the WikiProject taxonomy. This has been expanded experimentally to Wikidata as well (see here).

For more details, see:


Briefly


Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by Tilman Bayer

"Automatically Neutralizing Subjective Bias in Text"

From the abstract:[2]

From the abstract: "we introduce a novel testbed for natural language generation: automatically bringing inappropriately subjective text into a neutral point of view ("neutralizing" biased text). We also offer the first parallel corpus of biased language. The corpus contains 180,000 sentence pairs and originates from Wikipedia edits that removed various framings, presuppositions, and attitudes from biased sentences. Last, we propose two strong encoder-decoder [algorithm] baselines for the task [of 'neutralizing' biased text]."

Among the example changes the authors quote from their corpus:

Original New (NPOV) version
A new downtown is being developed which will bring back... A new downtown is being developed which [...] its promoters hope will bring back...
Jewish forces overcome Arab militants. Jewish forces overcome Arab forces.
A lead programmer usually spends his career mired in obscurity. Lead programmers often spend their careers mired in obscurity.

As example output for one of their algorithms, the authors present the change from

John McCain exposed as an unprincipled politician

to

John McCain described as an unprincipled politician


"Neural Based Statement Classification for Biased Language"

The authors construct a RNN (Recurrent Neural Network) able to detect biased statements from the English Wikipedia with 91.7% precision, and "release the largest corpus of statements annotated for biased language". From the paper:[3]

"We extract all statements from the entire revision history of the English Wikipedia, for those revisions that contain the POV tag in the comments. This leaves us with 1,226,959 revisions. We compare each revision with the previous revision of the same article and filter revisions where only a single statement has been modified.[...] The final resulting dataset leaves us with 280,538 pov-tagged statements. [...] we [then] ask workers to identify statements containing phrasing bias in the Figure Eight platform. Since labeling the full pov-tagged dataset would be too expensive, we take a random sample of 5000 statement from the dataset. [...] we present our approach for classifying biased language in Wikipedia statements [using] Recurrent Neural Networks (RNNs) with gated recurrent units (GRU)."


Dissertation about data quality in Wikidata

From the abstract:[4]

"This thesis makes a threefold contribution: (i.) it evaluates two previously uncovered aspects of the quality of Wikidata, i.e. provenance and its ontology; (ii.) it is the first to investigate the effects of algorithmic contributions, i.e. bots, on Wikidata quality; (iii.) it looks at emerging editor activity patterns in Wikidata and their effects on outcome quality. Our findings show that bots are important for the quality of the knowledge graph, albeit their work needs to be continuously controlled since they are potentially able to introduce different sorts of errors at a large scale. Regarding human editors, a more diverse user pool—in terms of tenure and focus of activity—seems to be associated to higher quality. Finally, two roles emerge from the editing patterns of Wikidata users, leaders and contributors. Leaders [...] are also more involved in the maintenance of the Wikidata schema, their activity being positively related to the growth of its taxonomy."

See also earlier coverage of a related paper coauthored by the same author: "First literature survey of Wikidata quality research"


Russian Wiktionary's quotes by publication year

Nineteenth-century writers important for Russian Wiktionary

From the abstract:[5]

"The quantitative evaluation of quotations in the Russian Wiktionary was performed using the developed Wiktionary parser. It was found that the number of quotations in the dictionary is growing fast (51.5 thousands in 2011, 62 thousands in 2012). [...] A histogram of distribution of quotations of literary works written in different years was built. It was made an attempt to explain the characteristics of the histogram by associating it with the years of the most popular and cited (in the Russian Wiktionary) writers of the nineteenth century. It was found that more than one-third of all the quotations (the example sentences) contained in the Russian Wiktionary are taken by the editors of a Wiktionary entry from the Russian National Corpus."

The top authors quoted are: 1. Chekhov 2. Tolstoy 3. Pushkin 4. Dostoyevsky 5. Turgenev


"Online Disinformation and the Role of Wikipedia"

From the abstract:[6]

"...we perform a literature review trying to answer three main questions: (i) What is disinformation? (ii) What are the most popular mechanisms to spread online disinformation? and (iii) Which are the mechanisms that are currently being used to fight against disinformation?. In all these three questions we take first a general approach, considering studies from different areas such as journalism and communications, sociology, philosophy, information and political sciences. And comparing those studies with the current situation on the Wikipedia ecosystem. We conclude that in order to keep Wikipedia as free as possible from disinformation, it is necessary to help patrollers to early detect disinformation and assess the credibility of external sources."


"Assessing the Factual Accuracy of Generated Text"

This paper by four Google Brain researchers describes automated methods for estimating the factual accuracy of automatic Wikipedia text summaries, using end-to-end fact extraction models trained on Wikipedia and Wikidata.[7]


"Revision Classification for Current Events in Dutch Wikipedia Using a Long Short-Term Memory Network"

From the abstract:[8]

"Wikipedia contains articles on many important news events, with page revisions providing near real-time coverage of the developments in the event. The set of revisions for a particular page is therefore useful to establish a timeline of the event itself and the availability of information about the event at a given moment. However, many revisions are not particularly relevant for such goals, for example spelling corrections or wikification edits. The current research aims [...] to identify which revisions are relevant for the description of an event. In a case study a set of revisions for a recent news event is manually annotated, and the annotations are used to train a Long Short Term Memory classifier for 11 revision categories. The classifier has a validation accuracy of around 0.69 which outperforms recent research on this task, although some overfitting is present in the case study data."


"DBpedia FlexiFusion: the Best of Wikipedia > Wikidata > Your Data"

From the abstract and acknowledgements:[9]

"The concrete innovation of the DBpedia FlexiFusion workflow, leveraging the novel DBpedia PreFusion dataset, which we present in this paper, is to massively cut down the engineering workload to apply any of the [existing DBPedia quality improvement] methods available in shorter time and also make it easier to produce customized knowledge graphs or DBpedias.[...] our main use case in this paper is the generation of richer, language-specific DBpedias for the 20+ DBpedia chapters, which we demonstrate on the Catalan DBpedia. In this paper, we define a set of quality metrics and evaluate them for Wikidata and DBpedia datasets of several language chapters. Moreover, we show that an implementation of FlexiFusion, performed on the proposed PreFusion dataset, increases data size, richness as well as quality in comparison to the source datasets." [...] The work is in preparation to the start of the WMF-funded GlobalFactSync project (https://meta.wikimedia.orgview_html.php?sq=Envato&lang=en&q=Grants:Project/DBpedia/GlobalFactSyncRE ).


"Improving Neural Question Generation using World Knowledge"

From the abstract and paper:[10]

"we propose a method for incorporating world knowledge (linked entities and fine-grained entity types) into a neural question generation model. This world knowledge helps to encode additional information related to the entities present in the passage required to generate human-like questions. [...] . In our experiments, we use Wikipedia as the knowledge base for which to link entities. This specific task (also known as Wikification (Cheng and Roth, 2013)) is the task of identifying concepts and entities in text and disambiguation them into the most specific corresponding Wikipedia pages."


Concurrent "epistemic regimes" feed disagrements among Wikipedia editors

From the (English version of the) abstract:[11]

"By analyzing the arguments in a corpus of discussion pages for articles on highly controversial subjects (genetically modified organisms, September 11, etc.), the authors show that [disagreements between Wikipedia editors] are partly fed by the existence on Wikipedia of concurrent 'epistemic regimes'. These epistemic regimes (encyclopedic, scientific, scientistic, wikipedist, critical, and doxic) correspond to divergent notions of validity and the accepted methods for producing valid information."


"ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia"

From the abstract:[12]

"... we describe ORES: an algorithmic scoring service that supports real-time scoring of wiki edits using multiple independent classifiers trained on different datasets. ORES decouples several activities that have typically all been performed by engineers: choosing or curating training data, building models to serve predictions, auditing predictions, and developing interfaces or automated agents that act on those predictions. This meta-algorithmic system was designed to open up socio-technical conversations about algorithmic systems in Wikipedia to a broader set of participants. In this paper, we discuss the theoretical mechanisms of social change ORES enables and detail case studies in participatory machine learning around ORES from the 4 years since its deployment."

References

  1. ^ Miz, Volodymyr; Hanna, Joëlle; Aspert, Nicolas; Ricaud, Benjamin; Vandergheynst, Pierre (17 February 2020). "What is Trending on Wikipedia? Capturing Trends and Language Biases Across Wikipedia Editions". WikiWorkshop (Web Conference 2020): 794–801. arXiv:2002.06885. doi:10.1145/3366424.3383567. ISBN 9781450370240.
  2. ^ Pryzant, Reid; Martinez, Richard Diehl; Dass, Nathan; Kurohashi, Sadao; Jurafsky, Dan; Yang, Diyi (2019-12-12). "Automatically Neutralizing Subjective Bias in Text". arXiv:1911.09709 [cs.CL]., To appear at the 34th AAAI Conference on Artificial Intellegence (AAAI 2020)
  3. ^ Hube, Christoph; Fetahu, Besnik (2019-01-30). "Neural Based Statement Classification for Biased Language". Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. WSDM '19. Melbourne VIC, Australia: Association for Computing Machinery. pp. 195–203. doi:10.1145/3289600.3291018. ISBN 9781450359405.
  4. ^ Piscopo, Alessandro (2019-11-27), Structuring the world's knowledge: Socio-technical processes and data quality in Wikidata, doi:10.6084/m9.figshare.10998791.v2 (dissertation)
  5. ^ Smirnov, A.; Levashova, T.; Karpov, A.; Kipyatkova, I.; Ronzhin, A.; Krizhanovsky, A.; Krizhanovsky, N. (2020-01-20). "Analysis of the quotation corpus of the Russian Wiktionary". arXiv:2002.00734 [cs.CL].
  6. ^ Saez-Trumper, Diego (2019-10-14). "Online Disinformation and the Role of Wikipedia". arXiv:1910.12596 [cs.CY].
  7. ^ Goodrich, Ben; Rao, Vinay; Liu, Peter J.; Saleh, Mohammad (2019-07-25). "Assessing The Factual Accuracy of Generated Text". Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD '19. Anchorage, AK, USA: Association for Computing Machinery. pp. 166–175. doi:10.1145/3292500.3330955. ISBN 9781450362016.
  8. ^ Nienke Eijsvogel, Marijn Schraagen: Revision Classification for Current Events in Dutch Wikipedia Using a Long Short-Term Memory Network (short paper). Proceedings of the 31st Benelux Conference on Artificial Intelligence (BNAIC 2019) and the 28th Belgian Dutch Conference on Machine Learning (Benelearn 2019). Brussels, Belgium, November 6-8, 2019.
  9. ^ Frey, Johannes; Hofer, Marvin; Obraczka, Daniel; Lehmann, Jens; Hellmann, Sebastian (2019). "DBpedia FlexiFusion the Best of Wikipedia > Wikidata > Your Data". In Chiara Ghidini; Olaf Hartig; Maria Maleshkova; Vojtěch Svátek; Isabel Cruz; Aidan Hogan; Jie Song; Maxime Lefrançois; Fabien Gandon (eds.). The Semantic Web – ISWC 2019. Lecture Notes in Computer Science. Cham: Springer International Publishing. pp. 96–112. doi:10.1007/978-3-030-30796-7_7. ISBN 9783030307967. Closed access icon Author's copy
  10. ^ Gupta, Deepak; Suleman, Kaheer; Adada, Mahmoud; McNamara, Andrew; Harris, Justin (2019-09-09). "Improving Neural Question Generation using World Knowledge". arXiv:1909.03716 [cs.CL].
  11. ^ Carbou, Guillaume; Sahut, Gilles (2019-07-15). "Les désaccords éditoriaux dans Wikipédia comme tensions entre régimes épistémiques". Communication. Information Médias Théories Pratiques. 36/2. doi:10.4000/communication.10788. ISSN 1189-3788.
  12. ^ Halfaker, Aaron; Geiger, R. Stuart (2019-09-11). "ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia". arXiv:1909.05189 [cs.HC].




Reader comments

2020-04-26

Wikipedia:An article about yourself isn't necessarily a good thing

This Wikipedia essay was first written by Sebwite in 2009. Essays are not project guidelines or policies, and may or may not have broad support among the Wikipedia commuity. This essay was written by Sebwite and 273 other editors.
Thinking of writing an article about yourself? Beware, because the "mirror" will reflect both your virtues and aspects of your private life that you would prefer to keep to yourself.

Are you planning to write a Wikipedia article about yourself? Are you planning to pay for someone to write an article on your behalf? Before you proceed, please take some time to thoroughly understand the principles and policies of Wikipedia, especially one of its most important policies, the neutral point of view (NPOV) policy.

Wikipedia seeks neutrality. An article about you written by anyone must be editorially neutral. It will not take sides and will report both the good and the bad about you from verifiable and reliable sources. It will not promote you. It will just contain factual information about you from independent, reliable sources.

This is a mixed blessing.

Some accomplishment or event, good or bad, may give you notability enough to qualify for a Wikipedia article. Once you have become a celebrity, your personal life may be exposed. No one is perfect, so your faults may get reported, and overreported, and reported enough to end up on Wikipedia. Even if you have lived a life free of scandal, and your Wikipedia article is spotless, at some time in the future your first publicized mistake may well end up getting into that article. Suddenly your fame may turn into highly publicized infamy. Yes, Wikipedia is highly publicized! It is mirrored and copied all over the place. Some reporters use Wikipedia as a source for their articles, so information about the mistakes you have made which end up in the Wikipedia article about you and which are covered in independent reliable published sources may be repeated.

Background

Getting attention is not always a good thing

An article about yourself is nothing to be proud of. The neutral point of view (NPOV) policy will ensure that both the good and the bad about you will be told, that whitewashing is not allowed, and that the conflict of interest (COI) guideline limits your ability to edit out any negative material from an article about yourself. There are serious consequences of ignoring these, and the "Law of Unintended Consequences" works on Wikipedia.[1] If your faults are minor and relatively innocent, then you have little to fear, but coveting "your" own article isn't something to seek, because it won't be your "own" at all. Once it's in Wikipedia, it is viewed by the world and cannot be recalled.

For example, Tiger Woods is one of the most accomplished golfers, yet his possible fall from grace is mentioned in his article.[1] Michael Phelps holds the record for the most gold medals in Olympic history, but his drunk driving arrests in 2004 and 2014 and a bong photo published in 2009 are both mentioned on Wikipedia, not violating Wikipedia's BLP guidelines. Rolf Harris was regarded as a national treasure for many decades, but his arrest and prosecution for child abuse was widely reported in the mainstream and broadsheet media, and it would be against Wikipedia's neutrality policy to not include it.

Elected officials (such as heads of state), entertainers with commercialized productions, authors of published materials, and professional athletes can reasonably expect various details of their personal lives to receive coverage.

While having a Wikipedia article may make you a celebrity of some sort, be ready to have your personal life exposed. If you are seen at the side of the road being issued a speeding ticket, and that gets reported, it may end up in an article about you. If your house is foreclosed and this gets reported, it may find its way onto Wikipedia. And if you get into an argument with another person in public, someone may report that in a reliable source, and it will be fair game for Wikipedia.

So is a Wikipedia article about yourself good for a resumé? Probably not. On a resumé, you want to tell all the best about yourself. Even if one day the article looks great, it may tell something very different the next. The nature of a wiki is such that information can change at any time. So beware – your boss, your boyfriend/girlfriend, or anyone else you are trying to impress may think one thing of you on one day, and you may lose all that respect by the next. All because of well-verified information in your Wikipedia article.

Hopefully, by the time you are ready for a Wikipedia article that would meet all inclusion guidelines to exist, you already know what fame feels like, are already aware of what is being said about you, and what to expect.

Miscellaneous things to be aware of

A memorial has been erected. It may be ignored by most passers-by, but it will surely be noticed by someone.

These are some miscellaneous things you should know in order to be prepared for there to be an article about you. They may be good or bad, depending on your expectations.

  • Since all information added to Wikipedia must be verifiable, this may make it difficult or impossible to defend yourself against sourced negative information added to an article about you. If the information is added within guidelines, you cannot tell "your side" of the story or otherwise provide a response other than what has already been reported in reliable sources.
  • Most terms, when entered into a Google search, will reveal 1–2 Wikipedia articles with an identical or closely matching title within the top 10, and often the first two hits. If there is an article about you, and your name is googled, especially if it is uncommon, chances are the article will come up as one of the first hits.
  • Wikipedia is cloned by many other sites and has many mirrors and forks, perhaps hundreds that follow a single article. If your name was never previously found via a Google search, and suddenly there is an article about you, it may lead to dozens of Google hits that are all some variation of either the current or a previous version of the article.
    A previous version of the article may contain possibly inaccurate derogatory statements which are not subject to editorial correction without a long, difficult and in some cases almost impossible process of locating, contacting, and persuading the person in control of the offending webpage to make the corrections you desire. Litigation might even be required, which would require advance expenditures and might not be successful.
  • There is a site called Deletionpedia that preserves copies of deleted Wikipedia articles. Even if you or someone else manages to get the article about you deleted, it may turn up there.
  • Random articles are often found by hitting the "Random article" tab on the left. Though there is no way of knowing exactly how often a random article is searched, whenever the number of times this occurs exceeds the number of Wikipedia articles, there is a good chance (about 63.2%) that the article about you will be displayed to someone. Unless you are very well-known, that person probably has no prior knowledge of you, and is unlikely to care much one way or the other who you are or what you have done. That person, however, may examine the notability of the article, or look out for other issues it may have.
  • Even if there is nothing bad to say about you, vandalism on Wikipedia is quite common, and most articles are vandalized by someone at one time or another. Vandalism consists of a variety of additions, removals, and other changes, often including hate speech (possibly against your ethnicity or the cause you stand for), profanity, inappropriate images, external links, or just random characters. Vandalism is generally reverted quickly, but unless oversighted, will remain permanently in the edit history. Most edit histories are likely ignored unless they bear some significance. (Older versions that remained around for some time are often robotically cloned to other sites.)
  • If you share a name with one or more other people, there is a good chance the article will be found by someone either on a disambiguation page or hatnote who may otherwise have no interest in who you are but may read the article out of curiosity. For example, the article titled Michael Jackson is about the pop singer. It bears a hatnote that reads For other persons named Michael Jackson, see Michael Jackson (disambiguation). That page lists a few dozen other lesser-known people who have this common name. The article about James Joyce (the novelist) contains a hatnote that reads This article is about the 20th-century writer with a link that leads to articles on other people called James Joyce, for example a baseball umpire and a 19th century politician. You would probably not expect that a reader looking for information about a specific person would read up on a different person in an unrelated field simply because that different person shares the specific person's name (e.g. you probably wouldn't expect someone trying to find out about the writer named James Joyce to read up on the congressman named James Joyce simply because of the shared name), but there are many curious people who just might do exactly that.
  • If you do not push to have an article about you on Wikipedia, but one was created without your involvement, you are probably notable for something to begin with. Any negative coverage about you has probably been known already to someone. But since Wikipedia joins multiple sources of information about you together, something that is not known to readers of one external source may suddenly be known just because Wikipedia readers are given instant access to multiple sources. In addition to being summarized within the article, footnoted sources are often external links, and readers of the article can then click on them and read in full something they would otherwise not know. So if Source A says only nice things about you, and Source B includes some things that are less-than-nice, someone who has relied on Source A (who thought nice things about you) and then decides also to read Wikipedia may suddenly learn about what is in Source B.

How Wikipedia's COI guideline applies

No ownership of articles

You do not own "your" article and cannot control what is in it. Read the conflict of interest (COI) guideline. It applies to you and limits your ability to edit out any negative material from "your" article. There can be serious consequences if you ignore this guideline because the Law of Unintended Consequences always applies:

If you write in Wikipedia about yourself, your group, your company, or your pet idea, once the article is created, you have no right to control its content, and no right to delete it outside our normal channels. Content is not deleted just because somebody doesn't like it. Any editor may add material to or remove material from the article within the terms of our content policies. If there is anything publicly available on a topic that you would not want included in an article, it will probably find its way there eventually. More than one user has created an article only to find themself presented in a poor light long-term by other editors. If you engage in an edit war in an attempt to obtain a version of your liking you may have your editing access removed, perhaps permanently.
In addition, if your article is found not to be worthy of inclusion in the first place, it will be deleted, as per our deletion policies. Therefore, don't create promotional or other articles lightly, especially on subjects you care about.

No autobiographies

The one fundamental rule for inclusion of any article here is notability as established in third party sources. Too many biographies exist here on people who aren't really all that notable, but by "squeezing blood" out of their pitifully few "turnips", they end up with a biography. They may think it's a good thing, but many end up regretting it. Even many famous people wish they didn't have a biography here. Some have threatened to sue Wikipedia to get theirs removed, but few succeed, as well they shouldn't.

Biographies should really only be written by third parties who discover a person's notability because they are truly notable, not because a friend or family member thinks they are more notable than they really are. We don't want autobiographies here, so if you started the article and thought you could promote yourself with a hagiographic article, you will quickly discover that you can't eat your cake and have it too. Why? Because your "autobiography" will quickly be turned into a biography. According to our policies, once the article has "gone public", it is too late for the author to do much of anything to keep it only positive, especially because it is thanks to NPOV that unsavory details get added, as they should. In fact, such attempts are considered disruptive and a violation of our NPOV policy. If the author now regrets that such details have been added and (mis)uses the "articles for deletion" (AFD) process to plead for the article to be deleted, their pleadings will actually violate the principles written above. They should have known about the COI warning above, but they chose to ignore it. Too bad. One could say that this is the just rewards of attempting to misuse Wikipedia for promotion or advocacy. We don't write hagiographies or sales brochures here, and we don't allow whitewashing.

This is a serious encyclopedia, not a free webhosting service where personal articles can be written and displayed on the world's biggest "billboard". To seek to misuse Wikipedia to write a hagiography, and then seek to misuse the AfD process to undo the unintended consequences of ignoring policies just won't do. One cannot rejoice when a policy-violating article somehow initially makes it through, but then regret when it gets revised into an article that abides by our NPOV policy. Such an AfD strikes right at the heart of our most sacred policy, NPOV. The proper response to all such AfDs is to keep the article with its negative content and make it even better.

What can be learned from all this? Seeking fame comes with a price, and part of that price may be an article here! "The higher they climb, the harder they fall." Self-promotion is often not a good thing, because "pride goes before a fall."

Articles about companies and organizations

Wal-Mart is renowned for selling plenty of goods at low prices, but the company is also opposed for a variety of reasons.

Just like an article about you or someone close to you, articles about companies and organizations can face the same issues. It may be exciting if the company you started and are trying to grow gets a Wikipedia article, but the purpose of the article is not to sell its goods or services, or to link to sites that do so (though a company's own site may be linked). Articles here are not sales brochures.

An article about a company or organization is not here to promote it; it is here to tell about it from a neutral point of view, using the information published about it in reliable sources. In many cases this will often include criticism of the company.

Many articles on companies and organizations have their own criticism sections. Some even have their own sub-articles devoted to those criticisms (e.g. Criticism of Wal-Mart, Criticism of Amnesty International). Wikipedia's mission is not to damage a company's or organization's reputation. Negative thoughts about these entities will probably be well-known to some of the public, and opinions one way or the other will be well-formulated in many people long before such an article is written, but once that article is written, many more people will know.

Still, even a single sourced scandal involving a company that is in its infancy, or even in a company around for a long time, can end up on Wikipedia. For example, the Peanut Corporation of America wound up with a Wikipedia article as a result of a food safety scandal. It was the scandal itself, not the Wikipedia article that led to the company's demise, but the company had no mention on Wikipedia until the scandal broke in the news, and the article certainly didn't help the company.

Some comforting thoughts

Believe it or not, Wikipedia will usually treat you more fairly than the rest of the world. While the goal of Wikipedia is to document all significant knowledge, facts, events, people, history, opinions, etc., it has some rules that serve to ensure accuracy, and also to reduce the risk of libel and malicious gossip from becoming a part of your biography. You can be sure that The National Enquirer, private websites and blogs, radio and TV, and even some newspapers, will not treat you as fairly as Wikipedia does. Even though you will have a significant conflict of interest that limits your right to edit your biography, you will still have a right to use its talk page to make suggestions and to request the correction of inaccurate information.

Policies and guidelines that serve this purpose are:

So even if you have an article here that contains negative information, comfort yourself with the thought that things could be much worse, just like they probably are outside of Wikipedia. Out there it can be hard to defend yourself, while here there are policies that you can use to do so. To top it off, your article here will probably become the highest ranked by search engines rather quickly.

Notes

  1. ^ For example, author Robert Clark Young is now better known for his attempts to add fluff to his own Wikipedia article than he was ever known for his books. Many other famous people have been embarrassed when caught editing their own articles – see Conflict-of-interest editing on Wikipedia

See also



Reader comments

2020-04-26

Open data and COVID-19: Wikipedia as an informational resource during the pandemic

Changwook Jung, Sun Geng, Meeyoung Cha are from the Institute for Basic Science, South Korea & KAIST. Inho Hong is from the Center for Humans & Machines, Max Planck Institute for Human Development, Germany.
Diego Saez-Trumper is a researcher employed by the WMF. This paper represents work beyond his regular duties. This article was originally published on "Medium". The text, but not the graphs, on "Medium" are licensed CC0

From the very start of COVID-19, when it was known just as an outbreak of an atypical pneumonia in China, people around the world have been finding and sharing information about the virus on Wikipedia, a frequent online resource for medical information. While the content and quality of the information on Wikipedia is shaped by volunteer editors (over 34K contributing to COVID-19 related pages) and by policies about verifiability, the activity generated by these volunteers and readers also generates a considerable amount of data itself. For example, we can explore how many Wikipedia articles have been created about COVID-19 related topics. Which sources are cited in those articles? How many people had reviewed such articles? Which are the most visited pages?

This post offers an overview of the COVID-19 related data generated in Wikipedia, highlighting the diversity of content that people read: from general information about the pandemic and regional responses, to the people who have been involved in the pandemic and misinformation about the virus. You can see some of this data in a new interactive resource, which will be updated regularly, from the Wikimedia Foundation. All the data used in this article is public and can be scrutinized, accessed, and used by third parties, using the MediaWiki API and other online resources offered by the Wikimedia Foundation. Sample source codes are made available at this Jupyter notebook.

Seeking information during COVID-19: English Wikipedia

Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. The first case was observed in China in late 2019, quickly followed by an outbreak in nearby East Asian countries like South Korea. In a few weeks, outbreaks could be seen throughout Europe and America, leading the World Health Organization (WHO) to declare it a pandemic. Countries like Iran, Italy, Spain, and the United States had seen over 50,000 confirmed cases (Fig 1). As of April 14, less than 90 days since the lockdown in China, COVID-19 has infected over 1,880,000 people and has killed more than 117,000 patients worldwide.

Figure 1. Case Statistics of COVID-19 in China, South Korea, Spain, and the US (right axis — log scale). These countries have outbreaks at different times. While the patient count increases at a smaller rate for China and South Korea by early March, Spain and the US show a sharp rise. On gray the number of page views on English Wikipedia COVID-19 related articles (left axis — linear scale). For the original version of Figure 1. click here

What kinds of information did people most seek on COVID-19? How did their attention change over time, as the number of patients quickly rose globally? How quickly were pages tracking regional cases updated? These are critical questions that help us better cope with the current pandemic as well as any other that might come in the future. We analyzed the complex and diverse attention of the public during the COVID-19 pandemic from the browsing logs of English Wikipedia pages. This post will feature findings on English content and patterns from other languages such as Korean, Italian, and Spanish will be revealed in our next post.

Methodology

Central to any content we observe on Wikipedia lies Wikidata, a type of structured data that links all Wiki projects. Most of the articles in Wikipedia link to a Wikidata Item, which among themselves are linked. For example, there is a link between the COVID-19 Wikidata item, and the Pandemic one. Therefore, we can identify relevant content related to COVID-19 by looking at these connections. This can be easily done by clicking on the "what links here" button that exists in every Wikipedia and Wikidata article. By looking at the Wikipedia pages that link to those items, we can obtain a list of articles related to COVID-19 in each language. Constraining results to English Wikipedia results in 878 Wikipedia pages.

These Wiki pages covered a plethora of topics, which could be grouped into one of the four categories found by qualitative coding.

  • Virus: Wiki pages that directly cover topics on the virus itself (such as Coronavirus disease 2019), developments on tests and vaccines (e.g., COVID-19 vaccine), and symptoms (e.g., Severe acute respiratory syndrome coronavirus 2) belong to this category. Out of 878 total pages, 11 Wiki pages were classified in this category.
  • Region: Tracking pages dedicated to specific regions were quickly created as outbreaks spread globally (e.g., 2020 coronavirus pandemic in New York (state)). Our data contain 310 such Wiki pages.
  • People: Celebrities and public figures who are related to COVID-19 either as spokespersons, doctors, or as infected patients were grouped as the people category. This category has the largest number of Wiki pages; 516 in total.
  • Others: All other pages that had linked to the COVID-19 Wikidata were categorized as 'others.' These pages often contained information about a specific event (such as 2020 Hubei lockdowns), location (such as NHS Nightingale Hospitals), or socio-economic impact (such as 2020 stock market crash). A total of 41 Wiki pages belong to this category.

Content Dynamics

One challenge in tracking people’s attention is the dynamics in data describing the event. Figure 2 shows a live example of the daily page-views of two Wiki pages: Pandemic and 2019–20 coronavirus pandemic.

At first, the Pandemic Wiki page was not a frequently visited one. On March 11, however, this page showed a sharp increase in the number of page-views when the WHO declared the disease as a pandemic. Note that this page is not linked to the COVID-19 Wikidata item and hence is considered “not relevant” in our analysis. Nonetheless, many people might arrived at this page either by searching for “pandemic” or by following the hyperlinks that lead to this page. There are other examples (particularly disease and outbreak-related pages like Influenza pandemic) that temporarily became popular due to COVID-19.

Second, the 2019–20 coronavirus pandemic Wiki page appears to have been created much earlier than the WHO’s announcement in March. This discrepancy may appear when titles of Wiki pages change over time. In this particular case, the original title had been 2019–20 coronavirus outbreak and was later changed to 2019–20 coronavirus pandemic. Such dynamic nature of Wiki content is representative of the time-evolving nature of events. When analyzing Wikipedia content, such dynamics should be understood.


Figure 2. English Wikipedia Views on Two Pandemic Pages. “Pandemic” as a general term had high attention only on March 11–12 when WHO declared the coronavirus outbreak a global pandemic. “2019–20 coronavirus pandemic” has been viewed steadily since January. The view counts had the steepest increase in a few days before the declaration. From late March, the view counts are decreasing with the slowing down growth rate. *2019–20 coronavirus pandemic page was 2019–20 coronavirus outbreak before WHO’s announcement. For the original version of Figure 2 click here

Most Viewed Pages

So which individual pages attracted the most attention? Using the public Wikimedia pageviews tools, we can compare the number of times that each of these pages were visited. Sorting content by the number of maximum daily views, we arrive at these five Wiki pages in Figure 3: 2019–20 coronavirus pandemic (on the top red line), 2020 coronavirus pandemic in the United States (yellow line), Severe acute respiratory syndrome coronavirus 2 (green line), Tom Hanks (blue line), and Timeline of the 2019–20 coronavirus pandemic (purple line).

Figure 3. Top 5 content Wiki pages. “2019–20 coronavirus pandemic” was viewed far more than the other pages. “2020 coronavirus pandemic in the United States” are getting more views with the spreading in the US. “Tom Hanks” had a sharp peak on March 12 because of his infection, but it soon lost attention. Most pages in the People category show a similar view pattern. “Severe acute respiratory syndrome coronavirus 2” had the most views at the beginning of the spreading. For the original version of Figure 3 click here

Pages like 2019–20 coronavirus pandemic, Severe acute respiratory syndrome coronavirus 2, and Timeline of the 2019–20 coronavirus pandemic are among the most visited pages from mid-January and throughout the pandemic. The second page describes SARS-Cov-2, the virus that causes COVID-19. On the other hand, the regional tracker page 2020 coronavirus pandemic in the United States was created in January, but only became popular in March as local outbreaks began in the US. The regional tracking sites and the accumulated view counts on those pages often mimic the outbreak pattern in that country. We also turn our attention to celebrity Tom Hanks, who was infected with the virus and has now recovered. Wiki pages of individuals are connected to the "COVID-19" Wikidata, but sometimes this linkage is removed by Wikipedia editors once the event passes. This adds to the complexity of Wikipedia data. Many other pages on people involved in some way with the disease show a similar spike in page-views during the pandemic.

Content by Topical Category

Next, we check how attention is divided across the four topical categories — virus, region, people, and others — by examining the aggregate view counts on these categories. The “virus” category Wiki pages altogether receive the most views during the initial phase until the end of February (Figure 4). They continue to be popular throughout the pandemic. From March 1st and onward, however, the “region” category Wiki pages get larger aggregate views. This shows how the attention shifted between these two content types, i.e., people continue to learn more information about the virus and, at the same time, track regional progress of transmissions. The “people” category content becomes popular from mid-March, even leading to larger aggregate views during days in April. Wiki pages on celebrity and public figures are generally popular within Wikipedia. As the virus progresses, more public figures become associated with COVID-19, diverging the public interest. The structured nature of Wikidata even allows us to understand how/why these people are associated with the disease. By using a semantic query language, we can see that most of the people were linked by having "Medical Condition" or "Cause of Death" COVID-19.

The “others” category pages that describe the socio-economic impact and other related events also track more extensive attention from mid-March. However, far less attention is paid to these pages compared to the other three categories.

Figure 4. To view this figure click here English Wikipedia Views by Topical Category. We divide COVID-19 related Wikipedia pages into 4 categories; Virus, Region, People, and Others. “Virus” category was dominant at the beginning of spreading, but the “Region” category became dominant on March 1. “Region” category has its highest peak on March 23. “People” category has a sharp peak on March 12 because of Tom Hanks’s infection. Interestingly, the other categories all have peaked on that day. ‘People’ fluctuated when the news of the confirmed famous spread. Most categories except “People” show decreasing

Content Share by Category

Stacked charts or ratio charts are excellent ways to visualize how the aggregate view counts by topical categories evolve over time. It is noteworthy to observe that the total view counts on all content are prominent, reaching over 500,000 views in late January. This is faster than the time when the WHO declared the outbreak a Public Health Emergency of International Concern (PHEIC) on January 30th. Internet users sought information actively through credible sources like Wikipedia during the time when not much information was available through official sources.

The most significant peak occurs a few days before March 11th, when the WHO declared COVID-19 to be characterized as a pandemic. This remark followed large-scale outbreaks in several countries in Europe and the Middle East, especially in Italy, Iran, and France. Since this point, the aggregate attention has stayed at a similar level, reaching over 6,000,000 views a day.

Figure 5. Stacked Chart and Ratio Chart by Topical Category. (a) The total number of views on COVID-19-related pages had exponentially increased from late February to early March. In March, the daily views were varying in the range of 6–8 millions. “Virus” category was dominant at the beginning of spreading, but the “Region” category has been dominant since March. For the original version of Figure 5a. click here
Figure 5(b). To view this figure click here. The proportion of the “Region” category has rapidly increased since February with domestic spreading outside China. The “Virus” category had its peak in January and was gradually decreasing. The “People” has the highest variance which is also significant in the proportion. The peak related to Tom Hanks even has a significant proportion among all pages related to COVID-19.

The ratio chart could identify the day-to-day composition of public attention across diverse topics. The shift of attention from the virus Wiki pages to regional tracker pages suggests that internet users are most interested in first gaining knowledge about the disease, but their attention shifts to more geographically constrained information (that likely have immediate impact on individuals). The speed at which these regional pages are updated is unprecedented, it is as quick as any local CDC reports (which we will look at in future reports). Overall, the increasing attention from the virus to regional and people pages indicate that Wikipedia has served multiple purposes during the pandemic, from a reliable source to collect scientific facts about the virus as well as to serve regional breaking news, and to other less critical updates related to the people’s category. Meeting such dynamic attention would not have been possible without the dedicated participation of over 34,000 editors who contributed to creating and updating these English Wikipedia articles.

Final remarks

The open data available on Wikipedia allows researchers and the community, in general, to cope with the urgent information needs during crises. The huge demand for local and regional-specific content highlights the importance of having a distributed community of editors who can generate such content. In our next post, we will show, by analyzing non-English Wikipedia pages, how readers are especially interested in what is happening in their local regions.

Reproducibility

All this analysis is based on public information. You can learn more about the results and methodology used in this analysis by visiting this page.



Reader comments

2020-04-26

Trusting Everybody to Work Together

This article by Pete Forsyth, a former editor-in-chief of The Signpost, is an edited version of an article published by Wikipedia @ 20 (CC BY 4.0)

I first embraced Wikipedia's radical, open invitation to write an encyclopedia in 2006. Most other web platforms at that time featured restrictive permission schemes. Their software, policies, and culture sharply limited users' ability to express their ideas. Wikipedia's platform, by contrast, mostly stayed out of my way. I was free to explore my interests and share my knowledge. I quickly got to know and work with other people who shared my interests, without needing to seek permission first. The site's documentation and policies encouraged me to use my own judgment and contribute what I felt was worthwhile; and its inner workings reinforced that ethos, enabling me to simply follow my own conscience and collaborate.

Wikipedia's creators eliminated the editorial bottleneck in traditional encyclopedia writing by using a flexible software platform that empowered contributors. But the original wiki software they had adopted in 2001 wasn't fully up to the task. Key policies and software features had to be developed before Wikipedia could truly become "the encyclopedia anyone can edit"—that is, before "anyone" could come to include hundreds of thousands of people working in concert. Once implemented, those early improvements to the platform also benefited discerning readers, providing insights into the production process. This element—facilitating reader insight into writers' actions and motivations—may at the time have seemed an ancillary benefit, secondary to the need to support a burgeoning community of writers and editors. But transparency to readers has become a key component of Wikipedia's identity, and as Wikipedia's star has risen, many of us have come to expect greater transparency from more traditional publications as well.

Those building the early versions of Wikipedia's software were guided more by instinct than by careful planning. Staff and volunteers deliberated individual features on the Wikipedia-L email list and on the wiki itself. Reading through those old messages, which are publicly archived, one gets to know a community in pursuit of a shared ideal, rather than a businesslike group pursuing a carefully planned strategic roadmap. The email list discussions are rife with aspirations and idealism, but they lack the kind of gravitas one might expect in the founding days of what would become one of the world's top websites. Wikipedia's earliest architects, both staff and volunteer, managed to construct a robust core of policies, procedures, and software that in many ways outstripped the projects planned out by the executives and investors of other major websites.

Two decades into Wikipedia’s existence, the importance of some of the key software features introduced in those early days is not widely understood. As I'll discuss, the early software of Wikipedia empowered writers and readers alike with a complete picture of the activities involved in producing the site's content; but more recent changes to the software have at times eroded the completeness of information available to those users, seemingly with little appreciation for its significance. Maybe the early architects' scorn for such norms as formal hypotheses, approval mechanisms, analysis of risk vs. opportunity, and detailed reports was a double-edged sword; it enabled them to accomplish incredible things at an inflection point in the Internet's evolution, but it did little to establish broad buy-in to a set of principles that could guide future efforts. Since there was no clearly-articulated central theory in the early days, we have little to go on in explaining how Wikipedia's basic infrastructure has come to support so much generative collaboration. A clear understanding of Wikipedia's early success would be of interest to Internet enthusiasts and historians, but it would have great practical value as well. Such an understanding should inform the kind of strategic vision and planning that a project of Wikipedia's current scale requires.

A theory explaining the unexpected success of Wikipedia could also inform other projects focused on organizing people around the globe to build an information commons. Other sites have tried and failed to replicate Wikipedia's success, but have those efforts been driven by a fulsome understanding of why Wikipedia succeeded? In the absence of a clear, widely accepted theory, we should view definitive statements that Wikipedia's success cannot be replicated with skepticism. We should hesitate to dismiss the possibility of other projects emulating Wikipedia's success, and instead continue to ask basic questions, until we arrive at satisfactory answers.

The appeal of transparency in a time of uncertainty

By the time I started editing in earnest in 2006, the software and policy infrastructure I discuss in this essay was largely in place. Wikipedia was well known, and well on its way to mainstream acceptance. Its English edition had nearly a million articles. It was growing by every measure and would soon become one of the top 10 most visited sites in the world. The masses, however, had still not grasped Wikipedia’s grand gesture of trust, or the possibilities it opened up for writers everywhere to influence the way knowledge is shared. The openness that made Wikipedia possible also made it the butt of numerous jokes, from academia to late night television.

The seemingly preposterous Wikipedia policy "Ignore All Rules" (IAR) held the same status as sensible policies like "Wikipedia is an Encyclopedia" and "Wikipedia is written from a Neutral Point of View". An early formulation of IAR stated: "if rules make you nervous and depressed, then simply use common sense as you go about working on the encyclopedia". Wikipedia dared to defy conventional wisdom, in order to magnanimously welcome the good faith of anyone and everyone.

My own nervous depression, however, had nothing to do with Wikipedia's rules. I worried about rules and traditions more broadly conceived. I worried that society's vast collection of rules, both written and unwritten, might not provide a framework robust enough to bring about a peaceful, respectful, sustainable civilization. Because some crazy stuff was happening.

The early 2000s were defined, for me and for many others around the world, by the horrific attacks on the U.S. on September 11, 2001, and by the political response to them. That day began with the inconceivable news that terrorists had hijacked multiple airplanes, and used them as weapons to kill thousands of civilians and strike at the heart of the country's government and financial institutions. But this was just the first wave of attacks on civilized life: the U.S. government, in its various responses, seemed all too ready to sacrifice honesty, transparency, and civil liberties in the name of security, causing further institutional damage from within.

In 2006, the U.S. Senate—a body often praised for its rigorous adherence to elaborate, time-tested rules ensuring rational and accountable decisions—passed a significant bill, the reauthorization of the USA PATRIOT Act. After the bill was signed into law, though, the public suddenly learned that last-minute changes to its text permitted the President to appoint prosecutors unilaterally. How could such significant changes be made in secret, especially for such a consequential bill? Who had made the changes? Amazingly, nobody seemed to know.

News outlets initially reported that Arlen Specter, chair of the Senate Judiciary Committee, was responsible. But the senator disavowed all knowledge. Reporters took his denials at face value, even before any alternative theory emerged. I found this confusing. Eventually, we learned that the committee's legal counsel had made the changes. This occurred without the senator's knowledge, but under his authority. Who was responsible for the changes? When it came down to it...nobody was.

Months later, President George W. Bush took advantage of the bill's provisions; in so doing, he ignited a scandal around the politicization of the court system. Senators on the committee, however, professed to have been no more aware of the changes to the law than the public and the press. The opacity of the entire situation was baffling. Wasn't this law, like all our laws, deliberated and passed in public view, and weren't records carefully preserved? Didn't legislators, or at least their staffs, know the first thing about using software, or any number of more traditional tools, to keep track of who made what changes? What was the point of a democratic system of government if a single, unelected person could slip a momentous provision into a law unnoticed? I longed for a system that did a better job of standing up for principles of transparency and accountability.[1]

I began dabbling with Wikipedia that same year. I didn't realize it yet, but working on this platform would gradually restore my sense of hope that humans could self-organize to make the world a better place. The philosophy behind Wikipedia, as expressed through policies and software design, drew on many familiar traditions; but the site's idiosyncratic take in longstanding concepts was new and refreshing.

Wikipedia's software, in contrast to the workings of our nominally democratic system of government, exposed data about who was making changes to what. As soon as a change was made, anybody on the Internet could learn who (or at least, what user account) had taken action and exactly what he or she had done. On Wikipedia, one didn't have to rely on civil servants or the press to make such information available; the information, by design and by reliable automated processes, was just a few clicks away.

Wikipedia's wiki software had roots in a software movement founded in the 1970s. At the core, the "free and open source software" (FOSS) movement was not based in anything technical, but on a basic assertion of rights on behalf of the individuals who build and use software. Wiki software was invented in 1995 by engineer Ward Cunningham. Cunningham aimed to enable software engineers to link their experiences and work collaboratively to document patterns in programming. The principles driving these software initiatives reflected the ethos of empowerment supposedly built into our democratic system of government, but seemingly on the wane in the broad public sphere. Even as I developed doubts about whether democratic values could survive in our social and government institutions, I was heartened to see them taking root in the tech world.

I had followed FOSS and wiki software for years, but opportunities to participate in earnest had eluded me. I wasn't a hardcore programmer, and in those days it was pretty difficult to even install and use most FOSS software. With Wikipedia, for the first time, my writing and editing abilities were enough to allow me to get involved.

In 2008, I attended RecentChangesCamp, a wiki conference in Palo Alto. Ehud Lamm, an academic from Tel Aviv, convened a discussion about whether wiki and its principles could help resolve conflict between Israelis and Palestinians. And so, there it was: confirmation that I wasn’t alone in finding parallels and connections between the world of wiki, and the most pressing political problems in the wider world.

The next year, the Wikimedia Foundation hired me to design a pilot program for putting Wikipedia to work in higher education. As I interviewed university teachers across the U.S., I learned that many saw great promise in Wikipedia's ability to impart skills that were difficult to teach in a traditional classroom setting. Wikipedia permitted people all over the world to communicate about the process of building and improving encyclopedic content. Teachers wanted to empower their students to interact, in substantive ways, with people of varying expertise and backgrounds across the world, all in the course of a single term. In academia, I was told, this kind of discourse was generally confined to academic journals, where response articles might be published year upon year. But only scholars advanced enough to successfully submit academic papers, and invested enough in the academic lifestyle to follow discourse on a scale of years or decades, could participate. With proper guidance in Wikipedia's unique environment, undergraduate students could more readily engage in a collaborative dynamic with feedback on a minute-by-minute basis, reaching far beyond the classroom for valuable learning opportunities.

Conditions that support collaboration

Early authors writing about Wikipedia emphasized the importance of tools that support community and communication. The wiki software Wikipedia originally employed in 2001, of course, already included many collaboration features. But important innovations were needed before Wikipedia's contributors could collaborate effectively at scale without active intervention by an traditional expert editors.

Ward Cunningham had built the first wiki, called WikiWikiWeb, to help programmers exchange knowledge about their field more freely. But Wikipedia's purpose was more specific and more ambitious—to enlist the masses to produce a coherent, canonical summary of human knowledge. Several principles guided Wikipedia from the early days; some have been formalized into lasting policies and documentation. Founder Jimmy Wales emphasized "fun for the contributors" as the most important guiding principle in a June 2001 email, predicting that without it, Wikipedia would die.[2]

Respecting human judgment and good intentions also held central importance. Around the time he first formulated the "Ignore All Rules" principle discussed previously, Lee Daniel Crocker, an active participant in Wikipedia's original email list who built an early version of the MediaWiki software, emphasized the importance of using judgment, rather than adopting "hard-and-fast rules."[3] Wales generally concurred.

A web platform's software influences and constrains how users engage with one another, so it plays an outsized role in defining the platform's identity. It establishes the context in which people do or don't enjoy themselves or feel empowered. Web software is often designed to strategically accomplish specific desired outcomes; the Google Analytics tool, for example, is popular in part because it permits a website operator to design campaigns, and then test how closely the site's users follow the desired paths through the site.

Wikipedia's early days, by contrast, were characterized by an absence of explicit strategic planning. In 2006, a Wikipedia user quipped: "The problem with Wikipedia is that it only works in practice. In theory, it's a total disaster."[4] The half-serious notion that no theory can capture the magic of Wikipedia caught on, and is often repeated by Wikipedians and Wikipedia scholars in presentations and in informal conversation. The idea even survived the Wikimedia Foundation's major strategic planning effort of 2009–10. As recently as 2015, respected Internet scholar Jonathan Zittrain used a variant of the phrase to anchor a video praising the value of Wikipedia.[5]

It's not so unusual, though, for new phenomena to initially elude theoretical or scientific explanation. This kind of mystery is what drives scientific inquiry. Scholars do not typically accept defeat in the face of unexplainable phenomena, but rather work to construct new theories that can explain them. Wikipedia, surprising and unfamiliar as it may have been when it emerged, should be no exception. If and when a robust theory of Wikipedia's "magic" emerges, I believe it will give prominent attention to a collection of about eight mutually supporting software features. Many of these eight features, noted below, emerged in the early days of refining Wikipedia's software.

MediaWiki, the wiki software tailored to serve Wikipedia's needs in the early 2000s, grew to incorporate these eight features. These features support one another, providing a complete picture of what's going on. Working with a fairly complete and self-supporting set of data, a Wikipedian can know about the activities of those working on the same content, without relying on other humans or advanced software permissions.

For a more familiar analogy, imagine that a board member newly appointed to help run a company is initially provided only a few tax returns. The board member wants to familiarize herself with the company's finances, and analyze which clients are most important to the company's future; but the tax returns only provide so much information. She won't learn much about individual clients from the tax returns, which provide only a certain kind of view of the company's finances. So she has to rely on a cumbersome process of asking questions of the accounting staff and her peers before she can really start to learn about the finances. What this sophisticated board member really needs, to efficiently accomplish her goal, is direct access to the company's accounting data. That data is complete; it can be audited in various ways to verify that it is internally consistent, that the amount of money in the bank aligns with the flow of income and expenses, and so forth.

Wikipedia users all have access to the site's underlying data, analogous to the company's accounting data. If they want to understand how things work, they don't need to first earn special privileges or build relationships with the proper executives or support staff. Wikipedia provides that data to everyone; so a writer or editor can easily learn the following things (with the relevant software features noted in parentheses):

  • that some content has changed (watchlists, recent changes)
  • exactly what was changed (including access to the original version) (diffs)
  • who made the change (user names, presented in various reports)
  • when they made it (date stamps, presented in various reports)
  • why they made it (edit summaries, entered optionally, presented in various reports)

The ability to find this kind of information can feed a Wikipedian's sense of confidence. When an editor thinks something in an article needs to be adjusted, but wants to fully understand how the article got to its present state, there's no need to email previous contributors, or an editor in charge of the article, and wait for a response, hoping that one hasn't inadvertently caused offense; the software quietly and reliably provides the relevant data. Those same data can help Wikipedians find others interested in the same topics, can help them resolve their own disagreements, and can help them learn technical processes by looking at one another's code.

The "view history" screen displays every edit chronologically, like version tracking features in a word processor. It can reveal whether a block of text was added all at once or piece-by-piece, by one author or by several. If the Wikipedian is curious about a specific change, the view history screen will guide her to a "diff" screen, which conveys exactly what changed between any two revisions, and indicates who's responsible. If that earlier editor chose to enter an edit summary, that summary is conveyed in the diff as well, reducing the guesswork involved in figuring out what motivated the change.

In addition, anyone consulting the encyclopedia also has access to this information. If a sophisticated reader familiar with the features outlined above sees something in an article that doesn’t ring true to them, he can learn something about how the text evolved to that point, including what the article's authors argued about, or what sources they considered but dismissed.

If the Senate Judiciary Committee used an open installation of MediaWiki to conduct its work, no reporter would ever have to ask who added a sentence to a bill; they could just consult the software's record, and see exactly who entered the text. If the lawyer entering the text wanted to demonstrate their diligence, they could use the edit summary field to provide deeper insights, for instance: "New provision entered per discussion with both Illinois senators."

But let's stick with Wikipedia. Let's imagine that sophisticated reader wants to influence the Wikipedia article. He can easily do so, by communicating with the editors or even becoming an editor himself. MediaWiki software makes it easy to:

  • address the person who made a change (talk pages, "Wikipedia" namespace, "email this user," social media handles on user pages)
  • undo the changes of others (revert, rollback)
  • create changes of one’s own (the almighty edit button itself)

Every Wikipedia article has a "talk page" for in-depth discussion. This permits editors to collect links to source materials, describe their longer-term efforts to expand a section, or talk through any differing views about what belongs in the article. Any Wikipedia contributor can raise complex topics for discussion, or ask her peers about their previous changes. This is a feature that didn't exist in Cunningham's original wiki.

In 2016, I invited Cunningham to discuss the ways wiki software had evolved since his first wiki, including the emergence of the talk page. He was quick to acknowledge that the ambition to build an encyclopedia introduced new needs and prompted the introduction of important new software features. He told me he had not considered separate talk pages when he designed wiki software.

Pages on the WikiWikiWeb and other early wikis reveal a free-flowing evolution of ideas, which might not have emerged if content and discussion were segregated. Wikipedia, however, aimed for certain standards in its main content; so a separate venue for discussion would be needed. Just a couple months after Wikipedia launched, Wikipedia founder Larry Sanger said: “I think the ‘talk’ page for every page should be automatic, one of the default links in the header/footer.”[6] The talk page became an indispensable component of Wikipedia's platform. With the benefit of hindsight, Cunningham credits the talk page as an important innovation that allowed wiki software to be used to build an encyclopedia.

The above mentioned page version history is another Wikipedia software innovation that wasn’t present in the first iteration of the WikiWikiWeb. Cunningham didn't initially preserve old versions at all. He added code to capture the most recent version, only to address a flaw in web browser software. Cunningham preferred to trust in the good intentions of his site's user community, so he initially resisted writing software code to capture old versions. He believed their contributions would tend to be more generative than destructive, and for his purposes that was enough. But he came to recognize that tens or hundreds of thousands of people building an encyclopedia would need ready access to version information. Later, he did expand the version history features, in response to a kind of malicious edits that emerged. His perspective as a software designer was no different from that of any seasoned wiki editor: when others grafted changes onto the core he built, he was quick to observe advantages. He acknowledged that the ability to observe patterns in a user’s work informs working relationships, and that version histories can make such patterns visible to all users.

Based on my experiences as a Wikipedia editor, I explained to Cunningham that I believed Wikipedia’s software supported collaboration by giving editors ready access to a fairly complete set of relevant data—the eight software features listed above. He agreed, and told me about a design concept, the "observe–orient–decide–act" (OODA) loop, which originated in military strategic theory. The OODA loop theory emphasizes that easy access to relevant data is a crucial component of effective decision-making. This theoretical notion has been referenced in a wide variety of fields in the decades since it was first presented; scholars have applied the OODA concept to such topics as nursing home design and self-managed construction teams. I was taken with this concept. After all, what is the work of a Wikipedia editor, if not making a string of decisions? And what decision doesn't benefit from better access to relevant information?

Over the Internet’s first few decades, many websites aiming to generate self-sustaining collaborative activity have come and gone. The presence or absence of the eight features identified above—the ability to perceive or act without editorial intervention—seems to have played a role in the demise of some of these sites.

For instance, in the early 2010s I dabbled with the Open Directory Project (ODP), a predecessor of Wikipedia that built a web directory through broad-based peer production. Accustomed to wikis, I was frequently frustrated by the challenges of determining exactly what had changed and when, who had done it and why, and how I could effectively engage them in discussion if I disagreed. I would struggle to find the right venue to inquire about such issues. When I was lucky enough to get a response, sometimes weeks after I had moved onto something else, it was sometimes unhelpful or even dismissive. Compared to my experience on Wikipedia, I had to spend more time, and expose myself to more frequent judgment by fellow volunteers, to accomplish even the most basic tasks. In my view, the challenges inherent in contributing to the ODP did not bode well for its long-term survival, especially when compared to sites like Wikipedia, where working together is easier.[7]

Wikipedia users have ready access to information, such as what changes have been made and who made them, that allows them to meet their own needs. By default, users have a great deal of autonomy; to accomplish basic tasks, they don't need to seek out special privileges or privileged information. To use the terminology of OODA, the software helps them observe their surroundings and orient themselves prior to deciding and acting.

In contrast to traditional content production models, users encounter fewer obstacles to creating high quality content.

Mobile Wikipedia: Core design principles forsaken

In the present decade, new hardware has expanded our options for editing Wikipedia. Many of us now carry powerful computers, known as smartphones, everywhere we go. We've become accustomed to using them to change things on the web, whether by updating our social media profiles, buying and selling on an e-commerce platform, or publishing blog posts.

Wikipedia is no exception. Since the advent of Internet-enabled phones, people have been using mobile devices to create and edit Wikipedia content. This was initially possible, if cumbersome, by using Wikipedia's normal web interface. Beginning about 2013, the Wikimedia Foundation began offering two new interfaces specially designed for mobile devices: the Wikipedia mobile web interface and the Wikipedia mobile app.

The mobile-specific options, however, leave out vital parts of the desktop experience. Specifically, they leave out some of the eight core software features listed above. They offer no access to talk pages and only minimal insight into version history. If a Wikipedian wants to understand the motivations and actions of her fellow editors, the mobile interface offers few clues about what they are doing or why they are doing it. A Wikipedia editor (or a Wikipedia reader, for that matter) is at a disadvantage when using the mobile interfaces.

How did that come to be? One might expect the experience of Wikipedia editors to play a central role in designing new software. In the early days, ideas flowed freely on a public email list and on wiki talk pages. But as Wikipedia entered its teens, the collaborative spirit between Wikimedia Foundation personnel and active editors had declined, leading to more complex dynamics in determining what features best serves software users. As the Wikimedia Foundation developed more sophisticated software for Wikipedia, its leaders at times disregarded feedback coming from actual Wikipedia editors. I had this experience in 2014, when a thousand Wikimedians joined me in expressing our shared concern about the Wikimedia Foundation’s approach to deploying a new piece of software. The Foundation never even acknowledged receipt of the letter, nor did it respond to the two requests it contained.[8]

The next year, Wikipedian and author Andrew Lih argued in a widely circulated op-ed that mobile devices were crucial to Wikipedia's future.[9] But the division between those designing the software and those using it was again on display. Lih quoted a member of the foundation board, who said: "some communities have become so change-resistant and innovation-averse" that they risk staying "stuck in 2006 while the rest of the Internet is thinking about 2020 and the next three billion users".[10]

Veteran Wikipedia author Jim Heaphy published an essay outlining a number of reasons why he uses the desktop interface while editing with his smartphone, instead of the specialized interfaces created by the Wikimedia Foundation, later that year.[11] But his words, it would seem, have gone unheeded; years later, the substantial differences between Wikipedia's desktop and mobile interfaces remain. When using Wikipedia's mobile interfaces, editors are exposed to only a fraction of the valuable information contained in its pages. The products they use obscure crucial information, such as access to talk pages, policy pages, and edit histories, thereby handicapping their experience and limiting their ability to collaborate with their peers.

Collaboration software is also a basis for reader trust

Writers working on a project together need data to help understand one another's motivations and activities. But they are not the only ones who value data beyond the text itself. Critical readers of a published work also need data, for a slightly different purpose: to inform how much trust they should place in what they read.

In recent centuries and in recent years, media scholars and the general public have questioned our ability to trust what we read and what factors influence that trust. The term "fake news" is prominent in contemporary discourse, which often centers on the challenges we face as individuals and as a society in discerning quality in information sources.

In a traditional context, readers looking for data to inform trust were typically limited to familiarity with a publisher's or a writer's reputation. Did you read something in the New York Times? You could probably trust that was true. Did you read it scribbled on a bathroom wall? Not so much. Wikipedia, as a product subject to continuous editing by almost anybody, takes a different path; it does not aspire to the kind of reputational trust enjoyed by the Times, but the eight software features in this essay separate it from the wholly unaccountable bathroom wall.

One of Wikipedia’s core traits is that it blurs the traditional lines between producer and consumer. So with Wikipedia, the kind of trust needed within the community of producers inevitably overlaps with the audience’s trust in Wikipedia. Fortunately, the kind of trust needed to build a working relationship is one of the things supported by Wikipedia’s desktop and server software, and by its attention to the OODA loop.

In one sense, Wikipedia makes things more complicated and messier by blending production and publication. But in so doing, it forces us to address the issues of trust inherent in both. One set of software tools provides editorial insights to writers and readers alike. In that sense, Wikipedia might just point the way toward a more coherent way to address issues of trust. In the articles and talk pages of Wikipedia, I have seen editors firmly committed to opposing views resolve seemingly intractable disputes. By resolving such conflicts, they produce better articles, which serve readers well by helping them understand and contextualize competing views. These dynamics bring my thoughts back to that 2008 discussion with Ehud Lamm, and to the kind of trust that will be needed if we are to overcome violence on a global scale.

Furthermore, as is often said, trust is a two-way street. Treating someone with respect, empowering them, and showing trust in them can often engender reciprocal trust. When Wikipedia takes steps that help its readers and contributors trust its content, it also expresses trust in them.

Throughout society, we are currently grappling with basic epistemic questions. How can we differentiate between "real" and "fake" news? What's the proper role of scientific studies in shaping policy decisions or our day-to-day decisions? Individual judgment—that same quality encouraged by Wikipedia's core policies for editors—is a key asset in charting a path forward. A reader literate in the scientific method is better equipped to evaluate a scientific study than one who has to rely on external authorities; a television viewer well-versed in the techniques of video manipulation or rhetorical trickery will be less susceptible to deception.

Wikipedia’s structure invites individuals and institutions to build literacy skills and develop trust. To the degree that we can put Wikipedia’s tools to appropriate use, we may just have the ability to build trust throughout society and generally make the world work better. Wikipedia doesn’t promise any of this to us, but its software and policies do nudge us in the right direction. Perhaps more than any other factor, those frequent nudges make Wikipedia a valuable resource worth protecting and worth exploring.

As some of the world’s largest technology platforms field tough questions about what value they ultimately provide, Wikipedia stands apart. Its idealistic roots in traditions like wiki and free and open source software, and its ability to build on the lessons of longer-standing social institutions, have served it well. Wikipedia empowers its editors and its readers, and its software encourages everyone involved to find ways to trust one another.

To fully appreciate the value of Wikipedia, a reader needs to consult features like talk pages and edit histories. As Wikipedia has grown, and as MediaWiki and similar software has proliferated across numerous websites, an ability to work with these software features has become a core part of information literacy. They should be taught in our formal educational institutions, and curious readers should investigate them on their own.

Trust in technical and strategic matters lags

In twenty short years, Wikipedia has had a substantial influence on the way software functions and the ways we interact online. From the start, Wikipedia has given its users ready access to relevant data, and has encouraged them to take action. Hundreds of thousands of people have taken up the challenge, and they have produced an enormous amount of useful information.

In spite of the central role of trust in Wikipedia's basic structure, though, an absence of trust has characterized many of the strategic and software design decisions involving Wikipedia. Too often, decisions and actions of the Wikimedia Foundation occur with little apparent connection to the loose-knit communities of volunteer Wikipedia users. In an ideal world, we would be able to recapture something of the collaborative spirit that characterized the early Wikipedia-L email list, which often enabled the experience of the site's heaviest users to directly influence how the software evolved. To apply principles that worked in a community of a few dozen to the sprawling behemoth that Wikipedia has become, of course, is no easy task; but for Wikipedia to truly reach its potential, both the volunteers and the foundation that guides its future should work tirelessly to ensure that relevant expertise is captured and put to good use. The spirit of collaboration that can sometimes work so well in generating encyclopedic content should guide the wider movement as well.

Those building new software intended to support and nurture collaboration would do well to study the interplay of the specific software features described here. This applies as much within the Wikipedia world as outside it: a mobile interface that obscures the "view history" screen, for instance, deprives the reader of a key element required for critical reading and thereby presents an incomplete view of the richness of Wikipedia's content. The platforms that support communication around the world, if they are to serve society well, must take careful stock of the kind of information needed by their editors and readers and ensure that it is presented in useful and coherent ways.

As for the USA PATRIOT Act reauthorization, President Bush signed a law the following year rescinding its most problematic components. The United States Attorney General resigned shortly thereafter, under pressure for his implementation of those components. Perhaps this was a simple case of democratic institutions holding each other accountable. But in those stories, the level of unrest in the public is always a factor and I can't help thinking about the many thousands of people who navigated that complex story of power and policy by consulting a weird encyclopedia, less than seven years into its existence, patched together by writers and researchers from all over the world. Let's hope their sustained commitment to the ideals of Wikipedia is enough to launch the site into future decades, in which we collaborate as effectively on matters strategic as on matters encyclopedic.

References

  1. ^ See Dahlia Lithwick: "Specter Detector: U.S. attorney scandal update: Who’s to blame for those alarming Patriot Act revisions?" Slate, March 5, 2007. Archived at https://slate.com/news-and-politics/2007/03/u-s-attorney-scandal-update-who-s-to-blame-for-those-alarming-patriot-act-revisions.html Accessed May 1, 2019.
  2. ^ Jimmy Wales, "Controversial thoughts," Wikipedia-L (electronic mailing list), June 13, 2001. Archive of message available at: https://lists.wikimedia.org/pipermail/wikipedia-l/2001-June/000187.html
  3. ^ Lee Daniel Crocker, "Reciprocal system…," Wikipedia-L (electronic mailing list), May 25, 2002. Archive of message available at: https://lists.wikimedia.org/pipermail/wikipedia-l/2002-May/002132.html
  4. ^ Gareth Owen, Gareth Owen (Wikipedia user page), January 20, 2006. Archive of posting available at: https://en.wikipedia.orgview_html.php?sq=Envato&lang=en&q=Special:Diff/35978744
  5. ^ Jonathan Zittrain, "Why Wikipedia Works Really Well in Practice, Just Not in Theory" (video), Big Think, April 7, 2015. https://bigthink.com/videos/the-model-for-wikipedia-is-truly-unique
  6. ^ Larry Sanger, "Web links subpage," Wikipedia-L (electronic mailing list), May 6, 2001. Archived at https://lists.wikimedia.org/pipermail/wikipedia-l/2001-May/000106.html
  7. ^ Andrew Lih, The Wikipedia Revolution, Hyperion. ISBN 978-1-4013-0371-6 (2009).
  8. ^ Pete Forsyth, "Letter to Wikimedia Foundation: Superprotect and Media Viewer," Meta Wiki, August 19, 2014. Archived at https://meta.wikimedia.orgview_html.php?sq=Envato&lang=en&q=Letter_to_Wikimedia_Foundation:_Superprotect_and_Media_Viewer
  9. ^ Andrew Lih, "Can Wikipedia Survive?" (opinion), The New York Times, June 20, 2015. Archived at: https://www.nytimes.com/2015/06/21/opinion/can-wikipedia-survive.html
  10. ^ "María Sefidari, answer to "Use of Superprotect and respect for community consensus", Wikimedia Foundation board elections questions, June 2015. Archived at https://meta.wikimedia.orgview_html.php?sq=Envato&lang=en&q=Wikimedia_Foundation_elections/Board_elections/2015/Questions/1
  11. ^ Jim Heaphy, "Smartphone editing," Wikipedia (user page), first published December 2015. Archived at https://en.wikipedia.orgview_html.php?sq=Envato&lang=en&q=User:Cullen328/Smartphone_editing




Reader comments

2020-04-26

What's making you happy this month?

The content of this Signpost piece is adapted from email threads titled "What's making you happy this week?" that are sent to Wikimedia-l.

We encourage you to add your comments about what's making you happy this month to the talk page of this Signpost piece.

This week has been difficult for many around the world. I have mentioned this in previous weeks, but it's vital to express my sympathies each and every time. A pandemic like this has global ramifications, and it's important to recognize that for many people, this week has been filled with hardship. There are people who have been laid off from their jobs, parents who are doing their best to take care of their children, students who have had to start taking classes online, and an endless amount of other unique situations.

One million articles logo for the Ukrainian Wikipedia

This week, some important milestones have been reached across the Wikimedia Movement. The Ukrainian Wikipedia has reached the 1 million article milestone. That's a huge milestone to reach, so I wish to congratulate everyone who contributes there! I asked Ата, an active contributor to the Ukrainian Wikipedia, about how this milestone affects the project. I found her response insightful, and I am thankful for it.

In addition, the The Wikimedia Community of the Kazakh language User Group has officially been recognized by the Affiliations Commitee. For more information, see m:Affiliations Committee/Resolutions/Recognition of Wikimedia Community of Kazakh language User Group and m:Wikimedia Community of Kazakh language User Group.

I have also discovered inspiring off-wiki initiatives, such as the Mozilla Open Source Support Team (abbreviated as MOSS) launching a COVID-19 Solutions Fund. Awards of up to $50,000 are being offered to open source technology projects.

Week of 12 April 2020: none for this week

Quotes from Dinah Maria Craik

Visiting the Thai Wikipedia and Thailand

In the interest of featuring languages which have not previously appeared in "What's making you happy this week?", I decided to visit the Thai language Wikipedia. There I learned that Thailand has two calendar systems. Also, I found images from Phu Sang National Park, which appears to be a place that I would enjoy visiting. One of the photos of a waterfall was a finalist for Commons Picture of the Year in 2017.

The animals that are pictured below are members of species that are found in the park.

Regarding translations

Skillful translations of the sentence "What's making you happy this week?" would be very much appreciated. If you see any inaccuracies in the translations in this article then please {{ping}} User:Pine or User:Clovermoss in the discussion section of this page, or boldly make the correction to the text of the article. Thank you to everyone who has helped with translations so far.


Your turn

What's making you happy this month? You are welcome to write a comment on the talk page of this Signpost article.



Reader comments

2020-04-26

Health and RfAs: An interview with Guy Macon

Last month, editor Guy Macon underwent what some called a pointlessly brutal and stressful RfA. Two days into the process, he suffered a life-threatening cardiac arrest, resulting in the inability to answer questions or respond to oppose votes. The good news is he survived, but unfortunately he failed the RfA. Here are his reflections on the entire episode.

What made you run for RfA in the first place?

I have long been of the opinion that anyone who wants to be a Wikipedia administrator must be crazy. It's a thankless job. However, I am also aware from some real-world volunteer work I do that you often see someone putting in a huge amount of time and effort and then burning out. Looking at Wikipedia, I saw a few areas – WP:AIAV in particular – where the same two or three admins are shouldering all the work, so when several admins urged me to run I reluctantly agreed.

Can you describe for our readers exactly what happened during the process?

I had answered a couple of questions right after the RfA was posted, and logged on to Wikipedia the next day to see if I had any more to answer. I was still doing what I always do first – checking engineering-related pages for vandalism and spam – when my heart suddenly stopped beating and I fell over unconscious. No pain, just a brief "I can't get any air" and then nothing. I came to briefly in the ambulance just in time to experience having my heart shocked, and woke up four days later in a hospital bed with a machine breathing for me. The good news is that I have been steadily recovering and am at home – but with a vest that monitors my heartbeat and will shock me if it stops again.

Do you think your cardiac arrest was a direct result of the RfA, or were there other long-term issues?

Completely unrelated. I was feeling zero stress over the RfA, I had yet to see any oppose !votes, and the doctors say that the problem was electrical – cardiac arrest caused by ventricular fibrillation. It had nothing to do with the usual reasons behind heart attacks (a different condition). In my case I have never used tobacco, alcohol, or drugs, I exercise regularly and eat a healthy diet. There is a chance that the cause was a virus that attacked the heart muscle years ago (it would take an autopsy to confirm this) but the odds are that it was random.

Did you expect your RfA to go the way it did?

Pretty much. I was expecting a lot of oppose !votes because I have been so active in the areas of pseudoscience and attempting to reform the WMF, both of which resulted in a lot of people being pissed off at me for writing things like WP:CANCER and WP:YWAB or for opposing the use of The Daily Mail as a source. I was surprised and somewhat disappointed at having my words twisted. When I wrote "I try to stay cool" and "there are also some cases where I was completely in the wrong and others where I was technically in the right but handled it really poorly" I was not claiming that I always keep my cool, but rather acknowledging that I have on multiple occasions not kept my cool and offering to apologize for those times when I didn't. It dismays me to see my words twisted into me supposedly saying that keeping my cool is something I have always succeeded at rather than something we all should be striving towards.

Do you think that RfA is a broken process, and if so, did yours strengthen (or weaken) that view?

Few people disagree that RfA is a broken process, but there is no agreement as to how to fix it, and the main problem is that what gets the !votes has little to do with what makes one a good administrator. Nothing about my RfA changed my opinion on that.

Going forward, do you intend to adjust your editing habits because of this RfA and perhaps run again in the future?

Without getting into detail, the oppose votes were a mixture. Some were total bullshit, faulting me for doing the right thing or posting an effective argument for or against a proposition. Others were spot on and correctly identified areas where I need to improve. Those later oppose !votes are the ones I treasure, and I believe that I have taken them to heart and made changes in my approach.

Will I run again? Maybe, but not any time soon. We still have too few admins working some jobs and we are still likely to see some of them burn out and quit.

Do you think that you might have succeeded if it weren't for the cardiac arrest (and thus having the ability to answer questions)?

Most likely not. RfA candidates rarely help their case by responding to individual criticisms and oppose !voters rarely look at new evidence and change their !votes. Just to make sure that everyone got an answer, I went back after it was over and answered all unanswered questions on the RfA talk page.

What words of advice would you give to any editors considering an RfA?

If you want to easily pass an RfA, avoid doing anything that shows that you have the slightest interest in or skill at the sort of things Wikipedia administrators do. Just create a lot of good content with good citations, and if you see someone putting something in an article claiming that, say, drinking bleach cures coronavirus, silently walk away and let them have their way, hoping that someone else will deal with it. Sorry to say this, but that's clearly what the !voters want.

Your RfA has been described as stressful, pointlessly brutal, etc. etc. As the nominee, would you agree?

It really didn't stress me at all. I read a bunch of RfAs before I ran, and mine had the exact same problems that so many others had. The result was pretty much as I expected.

Why did you go back and answer the questions after the RfA had closed?

Over the years I have seen many situations where somebody asked a reasonable question and got no answer. Sometimes someone asks a question, gets a reply asking for clarification, and then never edits again. That's annoying. I didn't want to leave anyone hanging just because I was in the hospital and unable to edit Wikipedia. Plus, if I ever run for RfA again some of the same questions will no doubt resurface, so I might as well get them out of the way now.

Are there any last words you would like to share with our readers?

Don't take Wikipedia for granted. You may think that it will always be there, but that's what they said about a bunch of organizations and websites that later fell apart and are now either gone or a shadow of what they once were. We need to be diligent and wise to keep what we have built.



Reader comments

2020-04-26

Multilingual Wikipedia

Denny Vrandečić was the Wikidata director until September 2013 and was a member of the Wikimedia Foundation board of trustees from July 2015 to April 2016. He earned a PhD at the Karlsruhe Institute of Technology. He now works at Google. -S

Wikipedia’s mission is to allow everyone to share in the sum of all knowledge. Wikipedia is in its twentieth year, and it has been a success in many ways. And yet, it still has large knowledge gaps, particularly in language editions with smaller active communities. But not only there – did you know that only a third of all topics that have Wikipedia articles have an article on the English Wikipedia? Did you know that only about half of articles in the German Wikipedia have a counterpart on the English Wikipedia? There are huge amounts of knowledge out there that are not accessible to readers who can read only one or two languages.

And even if there is an article, content is often very unevenly distributed, and where one Wikipedia has long articles with several sections, another Wikipedia might just have a stub. And sometimes, articles contain very outdated knowledge. When London Breed became mayor of San Francisco, nine months later only twenty-four language editions had listed her as such. Sixty-two editions listed out-of-date mayors – and not only Ed Lee, who was mayor from 2011, but also Gavin Newsom, who was mayor from 2004 to 2011, and Willie Brown, who was mayor from 1996 to 2004. The Cebuano Wikipedia even lists Dianne Feinstein, who was mayor from 1978 to 1988, more than a decade before Wikipedia was even created.

This is no surprise, as half of the Wikipedia language editions have fewer than ten active contributors. It is challenging to write and maintain a comprehensive and current encyclopedia with ten people in their spare time. It cannot be expected that those ten contributors keep track of all the cities in the world and update their mayors in Wikipedia. In many cases those contributors would prefer to work on other articles.

Wikidata to the rescue?

This is where Wikidata can help. And in fact, it does: of the twenty-four Wikipedia language editions that listed London Breed as mayor, eight got that information from Wikidata, and were up-to-date because of that. But Wikidata cannot really tell the full story.

Ed Lee, then mayor of San Francisco, died of cardiac arrest in December 2017. London Breed, as the president of the board of supervisors, became acting mayor, but in order to deny her the advantage of the incumbent, the board voted in January 2018 to replace her with Mark Farrell as interim mayor until the special elections to finish the term of Ed Lee were held in June. London Breed won the election and became mayor in July until the next regular elections a year later which she also won.

Now there are many facts in there that can be represented in Wikidata: that there was a special election for the position of the mayor of San Francisco, that it was held in June, that London Breed won that election. That there was an election in 2019. That Mark Farrell held the office from January to July. That Ed Lee died of cardiac arrest in December 2017.

But all of these facts don’t tell a story. Whereas Wikidata records these facts, they are spread throughout the wiki, and it is very hard to string them together in a way that allows a reader to make sense. Even worse, these facts are just a very small set of the billions of such facts in Wikidata, and for a reader it is hard to figure out which are relevant and which are not. Wikidata is great for answering questions, creating graphs, allowing data exploration, or making infobox-like overviews of a topic, but it is really bad at telling even the rather simple story presented above.

We have a solution for this problem, and it’s quite marvelous: language. Language is expressive, it can tell stories, it is predestined for knowledge transfer. But also, there are many languages in the world, and most of us only speak a few of them. This is a barrier for the transfer of knowledge. Here I suggest an architecture to lower this barrier, deeply inspired by the way language works.

Imagine for a moment that we start abstracting the content of a text. Instead of saying "in order to deny her the advantage of the incumbent, the board votes in January 2018 to replace her with Mark Farrell as interim mayor until the special elections", imagine we say something more abstract such as elect(elector: Board of Supervisors, electee: Mark Farrell, position: Mayor of San Francisco, reason: deny(advantage of incumbency, London Breed)) – and even more, all of these would be language-independent identifiers, so that thing would actually look more like Q40231(Q3658756, Q6767574, Q1343202(Q6015536, Q6669880)). On first glance, this looks much like a statement in Wikidata, but merely by putting that in a series of other such abstract statements, and having some connecting tissue between these bare-bones statements, we are inching much closer to what a full-bodied text needs.

A new project: a wiki for functions

But obviously, we wouldn’t show this abstract content to the readers. We still need to translate the abstract content to natural language. So we would need to know that the elect constructor mentioned above takes the three parameters in the example, and that we need to make a template such as {elector} elected {electee} to {position} in order to {reason} (something that looks much easier in this example than it is for most other cases). And since the creation of such translators has to be made for every supported language, we need to have a place to create such translators so that a community can do it.

For this I propose a new Wikimedia project, preliminarily called Wikilambda (and I am terrible with names, so I do not expect the project to be actually called this). Wikilambda would be a new project to create, maintain, manage, catalog, and evaluate a new form of knowledge assets: functions. Functions are algorithms, pieces of code, that translate input into output in a determined and repeatable way. A simple function, such as the square function, could take the number 5 and return 25. The length function could take a string such as "Wikilambda" and return the number 10. Another function could translate a date in the Gregorian calendar to a date in the Julian calendar. And yet another could translate inches to centimeters. Finally, one other function, more complex than any of those examples, could take an abstract content such as Q40231(Q3658756, Q6767574, Q1343202(Q6015536, Q6669880)) and a language code, and give back the text "In order to deny London Breed the incumbency advantage, the Board of Supervisors elected Mark Farrell Mayor of San Francisco." Or, for German, "Um London Breed den Vorteil des Amtsträgers zu verweigern, wählte der Stadtrat Mark Farrell zum Bürgermeister von San Francisco."

Wikilambda will allow contributors to create and maintain functions, their implementations and tests, in a collaborative way. These include the available constructors used to create the abstract content. The functions can be used in a variety of ways: users can call them from the Web, but also from local machines or from an app. By allowing the functions in Wikilambda to be called from wikitext, we also allow to create a global space to maintain global templates and modules, another long-lasting wish by the Wikimedia communities. This will allow more communities to share expertise and make the life of other projects such as the Content Translation tool easier.

This will allow the individual language communities to use text generated from the abstract content, and fill some of their knowledge gaps. The hope is that writing the functions that translate abstract content, albeit more complex, is also much less work than writing and maintaining a full-fledged encyclopedia. This will also allow smaller communities to focus on the topics they care about – local places, culture, food – and yet to have an up-to-date coverage of globally relevant topics.

What do you think?

To make it absolutely clear: this proposal does not call for the replacement of the current Wikipedias. It is meant as an offer to the communities to fill in the gaps that currently exist. It would be presumptuous to assume that a text generated by Wikilambda would ever achieve the brilliance and subtlety that let many of our current Wikipedia articles shine. And although there are several advantages for many parts of the English Wikipedia as well (say for global templates or content that is actually richer in a local language), I would be surprised if the English Wikipedia community would start to widely adopt what Wikilambda offers early on. But it seems that it is hard to overestimate the effect this proposal could have on smaller communities, and eventually on our whole movement in order to get a bit closer to our vision of a world in which everyone can share in the sum of all knowledge.

I invite you to read my recently published paper detailing the technical aspects and an upcoming chapter discussing the social aspects of this proposal. I have discussed this proposal with several researchers in many related research areas, with members of different Wikimedia communities, and with folks at the Wikimedia Foundation, to figure out the next steps. I also invite you to discuss this proposal, in the Comments section below, or on Meta, on Wikimedia-l, or with me directly. I am very excited to work toward it and I hope to hear your reservations and your ideas.

Update (May 8, 2020): An official proposal for Wikilambda is now up on Meta. Discussion and support can be expressed there.



Reader comments

2020-04-26

The Guild of Copy Editors

Puddleglum2.0 is both an active member of the Guild and a regular contributor to "The Signpost".

Ten years ago, the Signpost featured an interview with the then-coordinators of the Guild of Copy Editors to celebrate the completion of their very first copyediting drive. This May marks the tenth anniversary of that drive, so to commemorate that, we interviewed members of the GOCE to see how far the project has progressed. We interviewed two coordinators, Tdslk and Baffle gab1978, the current lead coordinator Jonesey95, and Tenryuu, a member who joined the Guild recently. Also making an appearance is Lfstevens, the only active copy editor left from that very first drive (other editors who participated in that first drive were copy editors for many years before moving on to other focus areas). Here are our questions and their answers.

Can you tell us a bit about what you normally do on Wikipedia besides the GOCE?

  • Tenryuu: Currently, I'm keeping up to date with some aspects of WP:COVID-19 and acting as a host over at the Teahouse.
  • Lfstevens: I like to work on machine learning, marine science and occasionally take on a big job. In the middle of scrubbing Socialism at the moment.
  • Jonesey95: I am a hard-core gnome as well as a template editor, making thousands of tiny edits behind the scenes.
  • Baffle gab1978: I don't do a great deal outside the Guild these days.

Why did you join this Project? Is it a different area than the ones you usually edit in?

  • Reidgreg: I was rightly called out for close paraphrasing as a new editor, and came to GOCE in 2016 to improve my writing skills. I'd also learned that the primary article I was editing was receiving close to a million views per month, and felt a responsibility to write as clearly as possible. I believe that what I've learned at GOCE applies to every aspect of editing, including how to interact with other editors.
  • Tenryuu: I was invited a few years back by an editor (whose username I have sadly forgotten) and was happy to join. I also work in other wikis besides those under Wikimedia, though I mostly do copyediting there as well.
  • Baffle gab1978: Having done some copy-editing back in 2012 I think, I decided making articles easier to read was the best way for me to help improve Wikipedia. For a few months I had a small queue of c/e requests in my user space; this grew quite alarmingly so I stopped doing that and began working from the GOCE's Requests page.
  • Lfstevens: I wanted a low-stress domain where I could improve my skills as a Wikipedian. And the backlog was pretty awful back in the day.
  • Jonesey95: I have always enjoyed copy editing, putting a fine polish on a bit of prose so that it is easier to read and understand.

What, in your opinion, is the primary mission of the GOCE?

  • Reidgreg: "To improve the quality of writing on Wikipedia" – through our own editing, by setting an example, and by providing guidance and hands-on training.
  • Tenryuu: To make articles say what they mean and mean what they say. As an encyclopedia it's important for Wikipedia to convey the information from sources accurately.
  • Tdslk: More pragmatically, we mainly edit articles from the backlog of those tagged as needing a copy edit, as well as requests from other editors, usually for articles being nominated for a good article or featured article.
  • Baffle gab1978: To make Wikipedia articles clearer, more concise and easier to read.
  • Lfstevens: To make each article better say what it is trying to say, in conformance with WP policies.

What should be or has been focused on more: maintaining the Requests (REQ) page or working on the backlog?

  • Reidgreg: There's always a question of how our efforts are best spent, and how to find the articles in most need of copy editing. Articles with maintenance tags tend to have more problems, while those requested for copy edit tend to be read more.
  • Tenryuu: While working on the backlog is important, there's a sense of relative urgency with requests. Most of them usually stop by the Guild as a last stop before being nominated for Good Article or Featured Article status. Work on the backlog is usually asked for in the forms of backlog drives happening every two months.
  • Tdslk: I'd say it is an even split.
  • Baffle gab1978: I agree with Tdslk here; I prefer working at REQ but others enjoy working on the backlog, especially during the Drives and Blitzes.
Progress made on the backlog since 2014. The backlog was at 8,323 articles at the start of the first drive in 2010.
  • Lfstevens: Don't know which is more important in general. Sometimes I feel like REQ an attempt to cut the line. The average article there is in better shape than the larger queue, but they are also in better shape.
  • Jonesey95: Different copy editors focus on one or the other, typically. We have editors who enjoy the backlog, since it provides a lot of choices and some articles that need a ton of help. Others prefer editing Requests, which tend to be higher-quality articles when they arrive in our queue.

Over ten years of Drives, what period in GOCE history has stood out to you the most? Why?

  • Reidgreg: Late 2018, from my own personal perspective, as that was my first term as lead coordinator. I felt that I was paying forward the help that I'd received.
  • Jonesey95: I have been involved in the GOCE for seven years as an editor and a coordinator, and no one period really stands out for me. This, to me, is one of the key attributes of the GOCE: steady, consistent work, chipping away at a big backlog of articles month by month. In our ten years of backlog reduction drives and nearly eight years of week-long blitzes, we have never missed a month. We don't try for heroic bursts of editing to knock out the backlog quickly; unlike effective and efficient gnome or bot work that can be done in sprints, I think that an effort like that applied to copy editing would result in editor burnout and low-quality edits. We have preferred steady quality, even if it takes a while.
  • Tdslk: Slow and steady is definitely our style. Every other month for ten years we have held a drive, and in the alternate months we have a week-long blitz, usually organized around a theme. If we do get down to zero articles in the backlog, that will be quite a moment.
  • Lfstevens: Right now things are very exciting as the backlog finally goes down to triple and hopefully double digits. After all this time, I'm ready to zero it out.

Do you think the GOCE is different from other Projects in how it runs and collaborates? If so, how?

  • Reidgreg: GOCE is a bit unique among WikiProjects: it identifies as a guild, and its article talk page banner denotes a review process. GOCE is similar to WP Peer review or WP Good articles, but with a narrower focus and a maintenance aspect which provides a training ground for new editors. It's a microcosm of the Wikipedia community, with new editors and ten-year veterans, all working toward common goals.
  • Jonesey95: I can't speak for other projects, but I can tell you what makes us successful. We got to where we are by establishing and maintaining a culture of hard work, collegiality, and relatively drama-free interaction, something that is not true of every place on the internet, or even on Wikipedia. I am proud of our Guild, not for the number of edits, because numbers can be gamed, but for our dedication to quality and teamwork. We are all volunteers here, and we are mostly self-policing, so I am sure that some questionable copy edits slip through, but it has been my experience in checking other copy editors' work that our edits almost always improve article prose substantially, even when they leave behind – or, inevitably, introduce – a few errors. [CC declaration: The above text is adapted from Wikipedia:WikiProject Guild of Copy Editors/Membership/News/2019 Annual Report]
  • Tenryuu: I'm only an active member in one other WikiProject: COVID-19. The atmosphere is different due to urgency that isn't on here; over there new items pop up almost daily and need to be discussed; because the project is spread out over many, many articles, very often there's link-hopping to discussion on other talk pages which can be disorienting for some people. The GOCE provides a central location for editors to come and submit articles for copy editing. The focus on checking grammar and spelling doesn't demand immediate attention, which gives copy editors time to thoroughly analyse the copy and give quality submissions.
  • Baffle gab1978: Yes, I think so, though we share similarities with a few others, such as Military History and Cleanup. Most WikiProjects are focussed on particular subject areas while we work across the entire encyclopaedia. We collaborate with a very diverse set of editors with equally diverse interests. We may be presented with an article about a recent Bollywood movie then immediately afterwards one about a tenth-century castle in Slovakia.

Going forward, once the backlog is cleared, do you think the mission of the GOCE will change?

  • Reidgreg: Our focus in how we go about the mission may change. We've been looking at other methods to find articles in need of copy editing, and at collaborations with other WikiProjects. As implied by the Wikipedia logo with its missing pieces, there will always be more for us to improve.
  • Jonesey95: There are a lot more articles to be written, which means a lot more copy editing is coming our way. Because this is the encyclopedia anyone can edit, editors have a wide variety of competence, skill, and experience. Our work takes the prose of dedicated contributors whose language skills may not be top-notch and improves those contributions so that knowledge can be delivered to the world in a clear, understandable way. It is no small thing that we do here at Wikipedia. As for a modified mission, the reduction of our backlog to less than one month of articles means that we will be discussing ways to offer our skills to other groups of editors who want to improve subsets of articles, or actively seeking out valuable articles that need copy editing but have not been tagged. [CC declaration: The above text is adapted from Wikipedia:WikiProject Guild of Copy Editors/Membership/News/2019 Annual Report]
  • Tenryuu: I find it highly unlikely that the backlog will ever be truly cleared. During March's backlog drive there were still articles being added. If it is relatively low in tagged articles then it might be likely that requests will be answered much faster.
  • Tdslk: During our ten years of drives, the number of articles in the backlog has gone from 8,323 to (at the time I type this) 272. In our more productive drives we have reduced the backlog by over 300 articles, so it is possible that we will actually entirely clear the backlog in May. If we don't make that goal, that's okay. We don't want to rush our work at the expense of quality. But if we do, we are talking about other ways to find articles that need our attention, as the others mention above.
  • Baffle gab1978: The Requests page is almost always busy, especially at this time when COVID-19 lockdowns are in force in many countries. As of my timestamp, we have 58 requests pending, which will keep us going for a while.
  • Lfstevens: I hope it will change to become more proactive, reaching out to GA folks and others attempting bigger things of which copy editing is only a part. I also am waiting for more artificial intelligence support, in the hope that it will absorb some of the more mundane copy editing tasks.

What are some challenges the GOCE faces, and, if there are any, how do you address them?

  • Reidgreg: One of our main challenges is targeting articles which need and are appropriate for copy editing – which is to say that they are notable, stable, and have adequate sourcing. Our coordinators manually vet these as time allows.
  • Tenryuu: Articles that have preexisting issues like lack of citations and unclear ideas. Most of the time it's a stab in the dark and assumptions have to be made for unclear passages. It's easier with requests as there is an active editor who is able to answer any questions a copyeditor may have.
  • Tdslk: There can be a misperception that we are, or claim to be, some sort of final authority, when we're actually just a ragtag gang of misfit gnomes trying our humble best to improve things. We do make mistakes, and we welcome (polite) feedback on our work. Likewise, while we might give opinions about style issues, we aren't some sort of Supreme Court that hands out the ultimate answers.
  • Baffle gab1978: What Reidgreg said! We're frequently asked to copy-edit articles that are unsuitable for copy-editing; they may be undergoing rapid changes or may be a venue for edit-wars and dramah. Some may be very long but have few references, or they may be the subject of a deletion discussion. Often, there's little point working on these articles because the page may be deleted or a full, thoughtful copy-edit can be wiped out with just a few edits or a quick reversion. Articles about rapidly changing or ongoing events such as the current COVID-19 crisis can be particularly difficult to copy-edit. At the Requests page, editors who are concerned about the suitability of an article for copy-edit—usually but not always project coordinators—can raise the problem at the Requests talk page (REQ Talk) and the request is put on hold. We discuss whether the request should be declined or whether the article should receive a copy-edit. This process stops the Requests page from getting bogged down with unattended requests, and helps us focus our energies on articles that are good candidates for a thorough copy-edit and will hopefully see further improvement from others.
  • Lfstevens: Mostly the same as other WP projects. Limited ability to attract new editors.

What would you say to newer editors wishing to help out?

  • Reidgreg: Be open-minded. Avoid following the writing style you were taught off-wiki, and think critically about every change you make to an article.
  • Tenryuu: Before diving straight into copyediting, have a look at what long-time members are doing and have done to copyedit articles. Better yet, keep an eye on articles that have made GA or FA status and see what changes copyeditors have done. Take your time: rushing an edit does not help anyone. I strongly recommend doing more than one pass when copyediting; sometimes I discover something I missed during my first pass that I catch in subsequent revisions. Do not hesitate to ask the coordinators or other experienced members if you would like to have your edits checked.
  • Baffle gab1978: Come and see what we do. Look around the Guild, read through some of our extensive talk pages and get an idea of the project's culture and procedures. If you've never copy-edited a Wikipedia article, have a look at our basic guide to copy-editing and start with a short article from the backlog. We don't just fix typos and grammatical errors; we remove any padding or waffle, simplify over-complicated text, and make the article clearer, more concise and direct for the reader. We're very approachable; you can always ask us for a review of your work—but do expect an honest answer! As you gain experience with copy-editing, you'll see how further improvements can be made to most Wikipedia articles. Most of all, realise every edit we make must help the reader to understand the subject of the article.
  • Lfstevens: Jump right in. We will welcome you, answer your questions and defend good faith work.
  • Tdslk: A good place to start is with the tagged articles in the backlog. Gaining experience there will help when working on articles from the Requests page.

Any last words you would like to share?

  • Tenryuu: Copyediting is not a race. This may be less true during backlog drives.
  • Baffle gab1978: Thank you to everyone who has taken the time to copy-edit articles; it's appreciated.
  • Jonesey95: If you think that you are done polishing your prose, read it aloud. You will find more mistakes and opportunities for improvement.
  • Tdslk: Stay safe and healthy, and help others as you are able! Together we can flatten the curve!




Reader comments

If articles have been updated, you may need to refresh the single-page edition.