Wikipedia:Wikipedia Signpost/Single/2020-08-30

Single-page Edition

WP:POST/1

30 August 2020

News and notes
The high road and the low road

In the media
Storytelling large and small

The high road and the low road

Contribute —

By SnowFire and Nosebagbear

Scots Wikipedia language quality problems ripple around the Internet, make the news, and trigger Meta-Wiki response

King James I and VI, the actual person to have done the most damage to the Scots language in history (source). James moved his court from Scotland to London in 1603 and later commissioned the King James Version (Authorized Version) of the Bible in English only, not Scots. Both God and the government now spoke English.

The Scots Wikipedia is a quiet, sleepy, low activity edition of Wikipedia written in the Scots language, the Anglic language traditionally spoken in the lowlands of Scotland. Nobody paid it much mind... until August 2020, when a Reddit thread entitled "I've discovered that almost every single article on the Scots version of Wikipedia is written by the same person – an American teenager who can’t speak Scots" spread across the Internet. This young volunteer, who dedicated a large amount of time over seven years to translating segments of the English Wikipedia into Scots, unfortunately seemingly was never told that maintaining English sentence structure and translating words 1:1 from a dictionary is no way to translate at all. Further investigation showed the quality problems ran deep: articles untouched by the prolific user in question also had poor quality and ungrammatical Scots, meaning that many more articles on Scots Wikipedia may be essentially worthless. The author of the Reddit post called the incident "cultural vandalism on an unprecedented scale" and wrote that "This is going to sound incredibly hyperbolic and hysterical but I think this person has possibly done more damage to the Scots language than anyone else in history."

The story hit the news media, for both high and low reasons. For the high road, this was a massive and notable failure of Wikipedia, one that has likely poisoned training data sets for the Scots language used by translation algorithms, and led any curious human readers to think that Scots is simply English in an accent with a few funky words thrown in. For the low road, the hobbies and naivety of the prolific user were mocked. Some of the notable coverage includes:

Several of the tabloid-style sources omitted from this list got the story essentially wrong, confusing Scots with the Scottish Gaelic language, suggesting that the user might have just been writing in silly Groundskeeper Willie-ese, or that the user's admin status was relevant (a status much-misunderstood by the media). The problem was the user's edits: there has been no allegation of misuse of admin tools.

Within the Wikipedia community, several actions were kicked off. User:MJL, the only other active admin on Scots Wikipedia at the time, boldly set up their own "AMA" (short for 'Ask Me Anything') on the Scotland Subreddit to explain the situation as well as solicit interest in potential fixes for Scots Wikipedia. The prolific user apologized for his mistakes after being informed of his lack of proficiency in Scots and has withdrawn from editing for now. Various split discussions eventually coalesced into an RFC on Meta-Wiki: meta:Requests for comment/Large scale language inaccuracies on the Scots Wikipedia. The current short-term course of action with the most support seems to be having a bot perform some sort of mass rollback of affected articles if they meet criteria (which are still being determined), enlisting new admins, and some proposals for other new bots.

The long-term solution requires understanding how this disaster happened in the first place. On Wikipedia user page language templates, the prolific contributor only marked himself a 2/5 and a 3/5 (changing over time) at Scots proficiency in the first place. If he was really that bad at Scots – more like a 1/5 – how did nobody notice? The answer: there simply wasn't anyone to notice. To the extent there ever was an authentic Scots-speaking Scots Wikipedia community, it had departed by 2012. The contributor's contributions were "Scots-y" enough to keep non-native speakers paying mild attention to the wiki from realizing the extent of their problems, and the user himself was a young kid when this started, clearly without the best self-awareness. If even one or two native Scots speakers had been active, they could have sounded the alarm, long before seven years had passed of wasted, counterproductive effort. The fundamental problem at Scots Wikipedia is the lack of a Scots-speaking community of editors. Perhaps not only bad things have emerged from the incident: the burst of attention has drawn the attention of Scots language groups. If the end result is to expand the Scots Wikipedia community, then perhaps something good will have come of this. –Sn

Interim Trust & Safety Case Review Committee

In early July, the Wikimedia Foundation announced the creation of the Interim Trust & Safety Case Review Committee (CRC), designed to allow appeal of certain less clear-cut cases decided by the WMF (both on-wiki and event bans), including appealing against a decision by T&S not to act on a complaint. A charter, a public call for applicants, and a Q&A with WMF Vice President of Community Resilience & Sustainability Maggie Dennis were also created. The CRC charter sets out the scope, objectives, and minimum candidate requirements.

The CRC is specifically temporary, designed to terminate with the creation of a permanent process as part of the Universal Code of Conduct. If those discussions have not concluded by July 1, 2021, then a new candidate call can be made for a new term or a single up to six-month extension can be granted if there is a clear indication the process will wrap up by then (such as if an implementation date has been agreed).

Process: Maggie Dennis responded to a question: "Let's say user FooBar is blocked as a T&S office action and requests case review [...] What does the appeal process look like, both from FooBar's perspective and the review committee's perspective?"

Subject to process changing by the CRC, a rough outline was offered as follows:

User emails inbox asking for a review
WMF attorney confirms case is not within remit of "statutory, regulatory, employment, or legal policies", and so is subject to review
User is notified it is under review and given likely timeline
CRC Chair appoints 5 members who review the case for "appropriate handling; appropriate collection of evidence; appropriate outcomes"
Members vote on whether to support, overturn (partially or fully), or return to the WMF for additional investigation
WMF enacts that decision
All involved users will be notified of decision

Overturning could occur on two main grounds: the sanction was inappropriately reached (the evidence didn't warrant the sanction) or the case did not fall within the T&S remit. This would indicate that a complaint could then be resubmitted at local community level (Arbitration Committee, Administrators' Noticeboard/Incidents (ANI) or equivalents). The publicly available documentation doesn't make it clear if a case could be simultaneously overturned on both grounds and whether that would still allow for a "double jeopardy" situation. Individuals may only make a single appeal per prohibition.

Candidates: the WMF imposes a number of eligibility requirements, including holding a current or prior advanced permissions role or an experienced contributor as part of a Wikimedia affiliate. Candidates also need to be members in full good standing with no current sanctions and be fluent in English. Several roles were viewed as exclusive, including current/former WMF staff. The en-wiki Community has decided to disallow currently serving arbitrators from acting as CRC members, which Maggie Dennis said would be accepted. Gender and lingual diversity were also sought, the latter most likely also driving a project diversity.

CRC members are intended to be able to spend up to five hours a week on the role, though there were repeated statements that it was anticipated to be less.

One particular requirement was part of a major theme: anonymity. As well as keeping all case information to themselves under a currently non-published reinforced non-disclosure agreement (NDA) – above and beyond the standard non-public information agreement – candidates made anonymous applications and are to keep both others' and their own membership secret. A number of changes were made after applications closed due to "negotiation between committee finalists and Deputy GC", including further limiting CRC membership knowledge to only three Board members but giving retired CRC members the right to self-disclose after 6 months.

The initial filter of applications was made by non-applying Stewards, with members chosen from that group by the WMF General Counsel Amanda Keton. The WMF is also hiring a contractor to support the committee.

Reporting: the CRC is to provide quarterly generalised reports (number of cases ratified, number of cases overturned). It's not clear whether additional information will also be provided, such as number of cases T&S prohibits from going to appeal. –Nbb

Brief notes

IRS form 990: The WMF has released its Form 990, the major financial filing required of US non-profit organizations, for the year ending June 2019. Links to other WMF financial documents and to a FAQ on Form 990 can be found here.
New administrator: The Signpost welcomes the English Wikipedia's newest administrator, Eddie891, who has the additional distinction of being a Signpost staffer.
Wiki Loves Monuments during the pandemic: WLM will be held this year despite all the difficulties posed by the pandemic. About half of the 40 participating countries will be holding the contest during September, according to the usual schedule. Other countries, including Brazil, Russia, and the United States, will hold the event during October. Bangladesh is scheduled for November and Israel for the month of Tishrei.
Milestones: Adam Cuerden is expecting to mark his 500th Featured picture today. Congratulations.

Reader comments

2020-08-30

Storytelling large and small

Contribute —

By Smallbones and Jonatan Svensson Glad

Journalists often report on the workings of the large Wikipedia community by focusing on a few individuals. It's an old storytelling technique – older than Homer – that lets the audience identify with the "main actors" in a complex situation and draw general conclusions starting from the specific details embodied by the individuals. But does this technique reflect the true complexity of the Wikipedia community where so many editors interact? And what happens when the editing community is not so large?

Just another article on COVID-19 and Wikipedia?

"Covid-19 is one of Wikipedia’s biggest challenges ever. Here’s how the site is handling it." The Washington Post examines Wikipedia's response to the pandemic focusing on the contributions of individual editors who they identify as Jason Moore, Netha Hussain, and Rosie Stephenson-Goodknight. Moore helped organize WikiProject COVID-19. Hussain, a doctor and researcher, wrote about COVID-19 and pregnancy. Stephenson-Goodknight wrote about fashion and the pandemic. They all contributed to the overall effort.

Our readers have likely seen articles like this before, though the Post does an exceptionally good job. Over a dozen articles in The Signpost have reported how Wikipedians have been affected by and reacted to the pandemic, including in our columns "Project report", "Community view", "Gallery", "Recent research", "Traffic report", "News from the WMF" and "From the editors". This column, "In the media", has reported over 7 months on about twenty stories published off-Wiki about Wikipedia's response, starting with Omer Benjakob's groundbreaking story published in Wired on February 9. Almost all these stories are highly complimentary to several individual editors, who deserve the recognition. Almost all report on the contributions of a broad segment of the community, which perhaps deserves even greater recognition.

A pleasant myth

"Why Wikipedia Decided to Stop Calling Fox a ‘Reliable’ Source" Noam Cohen in Wired traces Fox News's fall from the esteemed heights of being considered a "generally reliable" source on Wikipedia in the areas of science and politics. Starting with a series of challenges to Fox's reliability in the article Karen Bass by editor Muboshgu, Cohen ends with the reasoning of admin Lee Vilenski

We don’t have to assume that Fox is acting in good or bad faith—we simply need to assess if we can trust the information being provided. In this case, a lot of users suggested using our policies that it couldn’t be trusted enough to be 'reliable' for these two topics.

In other words, Wikipedians simply needed to rationally reassess Fox's record in these two areas. It's compelling reading, and he accuses Wikipedians of being "old-school" and even of having "integrity". But many Wikipedians have distrusted Fox's reliability since the beginnings of the project. More likely this distrust simply grew stronger as time passed. Or perhaps the political balance of editors has changed over the years. Thanks for the kind words, Noam.

Kamala Harris and an unpleasant reality

In "The Wikipedia War That Shows How Ugly This Election Will Be" (August 13), The Atlantic examines the reactions to then-presumptive Democratic presidential nominee Joe Biden naming Kamala Harris as his vice-presidential running mate for the 2020 U.S. Presidential election. According to The Atlantic, several news sources, including Fox News, have crossed a line in their reporting on Harris. Perhaps the worst offender was an op-ed, now denounced by its publisher Newsweek, which argues that Harris is not eligible to run for the office which requires being a "natural born citizen". The author of the op-ed, John C. Eastman, doesn't question that Harris was born in Oakland, California, but was expounding on a novel theory of the meaning of "natural born citizen". According to Newsweek, this questioning of her eligibility is now being used by others to support the "racist lie of Birtherism" that was used against Barack Obama.

Wikipedia's reaction was fairly quick in reporting Biden's naming of Harris. Questioning Harris's racial identity and a sexist slur soon followed. One editor was banned. Within 45 minutes of the announcement, the article had been updated, vandalized, corrected, and semi-protected. The questioning of Harris's African American identity then moved to the talk page.

The Scots Wikipedia and smaller language communities

See News and notes for the main story on the Scots Wikipedia incident

"A Teen Threw Scots Wiki Into Chaos and It Highlights a Massive Problem With Wikipedia" is about the language editions of Wikipedia that are supported by smaller editing communities that are vulnerable to problems that can go undetected in these communities. One example cited by Gizmodo is the Croatian Wikipedia, whose admins have come under criticism for wide-ranging instant bans of editors who disagree politically with them. An article in The Signpost alerted the broader Wikipedia community to the problem, but an RFC is still pending a Steward close. Another example from Gizmodo is the Cebuano Wikipedia, the second largest Wikipedia by article count, yet almost entirely written by a non-native speaker from Sweden using a bot. A healthy community is essential to check the sanity of contributions and keep order, yet a look at List of Wikipedias shows that only 28 out of 313 language editions of Wikipedia have had more than 1000 active editors in the past 30 days. Only 80 editions have more than 100 active editors. Considering that many of these "active" accounts are bots, spammers, or passing admins banning the spammer, that's a lot of editions that need some love and care - both from enthusiasts and native speakers.

Whitewashing by cryptocurrency company

FT Alphaville (not paywalled) describes "something like an 'edit war'" on the article about Brad Garlinghouse, the CEO of Ripple Labs. Ripple is in the business of transferring money across borders using its own cryptocurrency. Garlinghouse was caught off-Wiki saying that SWIFT, a leader in the field of cross-border money transfer, had a 6% error rate – a claim which has been convincingly refuted. He has also had some legal difficulties. A controversy section which described these facts was removed several times, first by an anon whose IP address geolocates to a city near a known Ripple business address, then by a logged-in user who FT-A suggests may be a Ripple employee.

David Gerard, a Wikipedia administrator and noted cryptocurrency skeptic, reverted the removal of information about Garlinghouse four times over the course of three weeks, following a similar number of edits by others over two months. He was quoted saying

It’s not clear precisely who did this but, if it looks like corporate whitewashing and quacks like corporate whitewashing, then we’ll treat it as such.

The Signpost completely concurs with Gerard’s judgement on this matter. Cryptocurrency is a type of private token, something like money, issued on the web with a Rube Goldberg mechanism used to verify transactions. These digital wooden nickels have been commonly used in money laundering and other criminal transactions, and extensively advertised on Wikipedia. There are many more articles about cryptocurrency on Wikipedia that have suffered from whitewashing much more than this one.

Fundraising in India

The WMF published Wikimedia Foundation kicks-off fundraising campaign in India on August 5 and many Indian newspapers closely repeated the story, including Inventiva, News 18, The Quint and Live Mint. The Indian Express went well beyond the press release/blog, writing that "Its balancesheet however, tells a different story. According to a Wiki page on its fundraising statistics, the website was able to raise $28,653,256 between 2018-2019, bringing its total assets to $165,641,425. The previous financial year, it garnered $21,619,373 — a marked rise from the $56,666 it earned through donations in 2003."

In brief

Elon Musk
@elonmusk

Aliens built the pyramids obv

July 31, 2020

Did aliens build the pyramids?: Elon Musk has been in the media before, commenting on the Wikipedia article about him. For his second act, Indian Express reported that he tweeted claiming that the Pyramids were created by aliens, and trying to back up his claim in a different tweet by quoting the Wikipedia article about the Great Pyramid of Giza. USA Today reported on the same tweets, but questioned Musk's seriousness rather than his sanity. According to BBC, these tweets caused Rania Al-Mashat, Egypt's Minister of Tourism, to respond to Musk on Twitter that the pyramids were in fact not built by aliens, and invited him to view tombs of the pyramid builders in Egypt. J

Elon Musk
@elonmusk

Please trash me on Wikipedia, I’m begging you

August 16, 2020

The third time's the charm: "Please trash me on Wikipedia, I’m begging you": Musk's latest tweet about Wikipedia might have caused a trash storm, if the article hadn't been extended protected only 4 edits after the tweet. The silliness moved to the talk page soon after that.
Google: Links From Wikipedia Does Nothing For Your Site & Has No SEO Value according to John Mueller of Google, quoted in Search Engine Roundtable, who wrote about businesses inserting a link into Wikipedia. "All you're doing is creating extra work for the Wikipedia maintainers who will remove your link drops. It's a waste of your time and theirs."
How to get a Knowledge Panel for your brand, even without Wikipedia: According to Search Engine Land about half of all of Google's Knowledge Panels use information from Wikipedia. But what should a business do to get this free advertising from Google if they can't create a Wikipedia article? They should try to create their own "independent" and "trustworthy" content on other sites, such as Wikidata!
Edit-a-Thon sheds light on LGBTQ life in the Lowcountry. From ABC News Chanel 4, Charleston South Carolina, covering Wikipedia gaps in South Carolina Lowcountry content.
"Wikipedia Updates Radio Infoboxes": The story in Radio World was written by Wikipedian Tcr25 and suggests that stations check the info in the box, and avoid conflicts of interest. An RfC had just completed an overhaul of infoboxes for radio and television stations.

Do you want to contribute to "In the media" by writing a story or even just an "in brief" item? Edit next month's edition in the Newsroom or leave a tip on the suggestions page.

Reader comments

2020-08-30

Going for the goal

Contribute —

By Eddie891 and Gog the Mild

Connor Barth, a placekicker for the Tampa Bay Buccaneers, prepares to kick a field goal during the first quarter of the Bucs v. New York Giants National Football League military appreciation game at Raymond James Stadium in Tampa, Fla., Nov. 8, 2015.

This Signpost "Featured content" report covers material promoted from July 26 through August 22. For nominations and nominators, see the featured contents' talk pages.

An Orangutan

A clay tessera bearing a possible depiction of Odaenathus wearing a diadem

Apollo 15 Command Module Pilot Al Worden.

Lesser horseshoe bat (Rhinolophus hipposideros) with blue metallic identification band on left wing

A football card showing a portrait of Mann in his blue Yanks jersey

Cover of the first issue of Infinity Science Fiction; artwork by Robert Engle

A pilgrim makes a supplication in the direction of the Kaaba, the Muslim qibla, in the Sacred Mosque of Mecca.

Featured articles

19 featured articles were promoted this month.

The 1998 Football League First Division play-off Final (nominated by The Rambling Man) was an association football match played on 25 May 1998 at Wembley Stadium, London, between Charlton Athletic and Sunderland. The match was to determine the third and final team to gain promotion from the Football League First Division, the second tier of English football, to the Premier League for the 1998–99 season. The top two teams of the 1997–98 Football League First Division season gained automatic promotion, and the teams placed from third to sixth place in the table took part in play-off semi-finals; Sunderland had ended the season in third position and Charlton had finished fourth. The clubs won their semi-finals and competed for the final promotion place. Winning the game was estimated to be worth between five and ten million pounds to the successful team. Played in front of 78,000 spectators, Charlton won 7–6 on penalties.
Evelyn Mase (nominated by Midnightblueowl) was a South African nurse. She was the first wife of the anti-apartheid activist and future president Nelson Mandela, to whom she was married from 1944 to 1958.
Meghan Trainor (nominated by MaranoFan) is an American singer-songwriter and talent show judge. She rose to prominence after signing with Epic Records in 2014 and releasing her debut single "All About That Bass", which reached number one on the US Billboard Hot 100 chart and sold 11 million copies worldwide, and drew criticism for its lyrical content. Trainor has released three studio albums with the label, and has received various awards and nominations, including the 2016 Grammy Award for Best New Artist.
Orangutans (nominated by LittleJerry) are great apes native to Indonesia and Malaysia. They are found in the rainforests of Borneo and Sumatra, but during the Pleistocene they ranged throughout Southeast Asia and South China. The most arboreal of the great apes, orangutans spend most of their time in trees. They have proportionally long arms and short legs and their hair is reddish-brown. Orangutans are among the most intelligent primates. All three orangutan species are considered critically endangered. Human activities have caused severe declines in populations and ranges. Threats to wild orangutan populations include poaching, habitat destruction because of palm oil cultivation, and the illegal pet trade. Several conservation and rehabilitation organisations are dedicated to the survival of orangutans in the wild.
Portraits of Odaenathus (nominated by Attar-Aram syria): Odaenathus, the king of Palmyra from 260 to 267 CE, has been identified by modern scholars as the subject of sculptures, seal impressions, and mosaic pieces.
Alfred Worden (nominated by Wehwalt) was an American test pilot and astronaut, and the command module pilot for the Apollo 15 lunar mission in 1971. A former test pilot, he served on the support and backup crews for Apollo 9 and 12 before selection for Apollo 15. In lunar orbit, he became the individual who was the furthest from any other human being, a record he still holds. He also performed the first deep-space extravehicular activity, or spacewalk, in history. His career was effectively ended by a scandal over carrying postal covers to the Moon, and he retired in 1975.

British nuclear tests at Maralinga (nominated by Hawkeye7) were conducted between 1956 and 1963 in the Woomera Prohibited Area in South Australia. A total of seven nuclear tests took place with approximate yields ranging from 1 to 27 kilotonnes of TNT (4 to 100 TJ). The site was also used for trials of neutron initiators and tests on the compression of nuclear weapon cores and the effects of fire on atomic weapons. The site was left contaminated with radioactive waste, and a clean-up was attempted in 1967. A further clean-up was completed in 2000. In 1994, the Australian government paid $13.5 million compensation to the traditional owners, the Maralinga Tjarutja people.

The First Battle of Newtonia (nominated by Hog Farm) was fought on September 30, 1862, near Newtonia, Missouri, during the American Civil War. Confederate soldiers commanded by Colonel Douglas H. Cooper clashed with a Union column commanded by Brigadier General Frederick C. Salomon. Cooper's force consisted mostly of cavalry including a brigade of Native Americans. After a sharp skirmish in the morning, seesaw fighting took place during the afternoon. Shortly before nightfall, the Confederates made an all-out attack, causing Salomon to withdraw. The entire Union force advanced towards Newtonia in early October, and Cooper abandoned Missouri.

Horseshoe bats (nominated by Enwebb) are bats in the family 'Rhinolophidae'. In addition to the single living genus, Rhinolophus, which has about 106 species, the extinct genus Palaeonycteris has also been recognized. Horseshoe bats are considered small or medium-sized microbats, weighing 4–28 g (0.14–0.99 oz), with forearm lengths of 30–75 mm (1.2–3.0 in) and combined lengths of head and body of 35–110 mm (1.4–4.3 in). Horseshoe bats are relevant to humans in some regions as a source of disease, as food, and traditional medicine. Several species are the natural reservoirs of SARS coronavirus, though masked palm civets were the intermediate hosts through which humans became infected. Some evidence suggests that some species could be the natural reservoir of SARS-CoV-2, which causes coronavirus disease 2019. They are hunted for food in several regions, particularly sub-Saharan Africa, but also Southeast Asia. Some species or their guano are used in traditional medicine in Nepal, India, Vietnam, and Senegal.
Siamosaurus (nominated by PaleoGeekSquared) is a genus of spinosaurid dinosaur that lived in what is now Thailand during the Early Cretaceous period (Barremian to Aptian) and is the first reported spinosaurid from Asia. It is confidently known only from tooth fossils; the first were found in the Sao Khua Formation, with more teeth later recovered from the younger Khok Kruat Formation. Like in all spinosaurids, Siamosaurus' teeth were conical, with reduced or absent serrations. This made them suitable for impaling rather than tearing flesh, a trait typically seen in largely piscivorous (fish-eating) animals. Spinosaurids are also known to have consumed pterosaurs and small dinosaurs, and there is fossil evidence of Siamosaurus itself feeding on sauropod dinosaurs, either via scavenging or active hunting. Siamosaurus' role as a partially piscivorous predator may have reduced the prominence of some contemporaneous crocodilians competing for the same food sources. Isotope analysis of the teeth of Siamosaurus and other spinosaurids indicates semiaquatic habits. Siamosaurus lived in a semi-arid habitat of floodplains and meandering rivers, where it coexisted with other dinosaurs, as well as pterosaurs, fishes, turtles, and crocodyliforms.
The Treaty of Lutatius (nominated by Gog the Mild) was the agreement of 241 BC between Carthage and Rome which ended the First Punic War after 23 years. Accepting defeat, the Carthaginian Senate ordered their commander on Sicily to negotiate a peace treaty. A treaty was agreed by which Carthage would hand over what it still held of Sicily, relinquish several groups of islands nearby, release all Roman prisoners without ransom, and pay large reparations over 10 years. In 237 BC Carthage prepared an expedition to recover the island of Sardinia, which had been lost to rebels. Cynically, the Romans announced that this an act of war and that their peace terms were the ceding of Sardinia and Corsica and the payment of an additional indemnity; these were added to the treaty as a codicil.

Bob Mann (nominated by Gonzo fan2007 and Cbl62) was an American professional football player in the National Football League (NFL). A native of New Bern, North Carolina, Mann played college football at Hampton Institute in 1942 and 1943 and at the University of Michigan in 1944, 1946 and 1947. Playing the end position, he broke the Big Ten Conference record for receiving yards in 1946 and 1947. After not being selected in the 1948 NFL Draft, Mann signed his first professional football contract with the Detroit Lions, where he stayed for two seasons. He later played for the Green Bay Packers for parts of five seasons until 1954. Mann broke the color barrier for both teams.
The 2010 Twenty20 Cup Final (nominated by Harrias) was a 20 overs-per-side cricket match between Hampshire County Cricket Club and Somerset County Cricket Club played on 14 August 2010 at the Rose Bowl in Southampton. It was the eighth final of the Twenty20 Cup.
Al-Hafiz (nominated by Cplakidas) was the eleventh caliph of the Fatimids from 1132 to his death in 1149. Many Isma'ili followers abroad refused to recognize him and even in Egypt there were uprisings throughout his reign. He tried to restrain his over-mighty viziers, with mixed success. He was repeatedly forced to give way to the demands of various military factions, and ultimately was unable to halt the evolution of the vizierate into a de facto sultanate, independent of the caliph. His successors would be reduced to puppets at the hands of powerful viziers, until the end of the Fatimid Caliphate in 1171.

Hurricane Willa (nominated by Hurricane Noah, KN2731, and Hurricanehink) was a powerful tropical cyclone that brought torrential rains and destructive winds to southwestern Mexico, particularly the states of Sinaloa and Nayarit, during late-October 2018. It was the twenty-fifth tropical cyclone, twenty-second named storm, thirteenth hurricane, tenth major hurricane, and record-tying third Category 5 hurricane of the 2018 Pacific hurricane season. Willa was the first major hurricane to make landfall in the Mexican state of Sinaloa since Lane in 2006.
Infinity Science Fiction (nominated by Mike Christie) was an American science fiction magazine, edited by Larry T. Shaw, and published by Royal Publications. The first issue, which appeared in November 1955, included Arthur C. Clarke's "The Star", a story about a planet destroyed by a nova (an exploding star) that turns out to have been the Star of Bethlehem; it won the Hugo Award for that year. Shaw obtained stories from some of the leading writers of the day, including Brian Aldiss, Isaac Asimov, and Robert Sheckley, but the material was of variable quality. In 1958 Irwin Stein, the owner of Royal Publications, decided to shut down Infinity; the last issue was dated November 1958. The title was revived a decade later by Stein's publishing house, Lancer Books, as a paperback anthology series. Five volumes were published between 1970 and 1973, edited by Robert Hoskins; a sixth was prepared but withdrawn after Lancer ran into financial problems at the end of 1973.
The qibla (nominated by HaEr48) is the direction towards the Kaaba in the Sacred Mosque, Mecca, Saudi Arabia, which is used by Muslims in various religious contexts, such as serving as the including the direction of the salah or ritual prayer.
Yugoslav destroyer Beograd (nominated by Peacemaker67) was the lead ship of a class of destroyers built for the Royal Yugoslav Navy during the late 1930s. In World War II, she was captured and saw extensive service with the Royal Italian Navy, completing over 100 convoy escort missions, mainly on routes between Italy and the Aegean or North Africa. In September 1943, she was captured by the German Navy and redesignated TA43. She was sunk or scuttled at Trieste in 1945.
The Roman withdrawal from Africa in 255 BC (nominated by Gog the Mild) was the attempt by the Roman Republic to rescue the survivors of their defeated expeditionary force to Carthaginian Africa (in what is now northeastern Tunisia) during the First Punic War. A force of 390 warships fought and defeated 200 Carthaginian vessels and the Roman survivors of the previous year's invasion were evacuated. While returning to Italy the Roman fleet encountered a storm off the south-east corner of Sicily: 384 ships were sunk and more than 100,000 men were lost.

Featured lists

Sigourney Weaver at the 2017 San Diego Comic-Con

Vilnius Historic Centre, a World Heritage Site in Lithuania.

Ernst van Dyk has won the Boston Marathon ten times, more than any other athlete.

The 2019 Wikimedian of the Year: Emna Mizouni

Clark Gable in a 1938 publicity still

Brad Pitt at the Washington, D.C premiere of Fury in 2014

Mary Pickford in 1916

20 featured lists were promoted this month.

Cardiff City Football Club is a Welsh professional association football team based in Cardiff. The club was founded in 1899 and initially played in local amateur leagues before joining the English football league system. After spending a decade in the Southern Football League, Cardiff joined the Football League in 1920. A total of 123 players (nominated by Kosack) have won at least one cap in senior international football while playing for Cardiff, representing 25 nations. Chris Gunter is the youngest Cardiff player to win an international cap, having represented Wales in 2017 at the age of 16. Kenwyne Jones has scored more international goals than any other Cardiff player. He scored ten times for Trinidad and Tobago between 2014 and 2016.
Hot Country Songs is a chart that ranks the top-performing country music songs in the United States, published by Billboard magazine. 13 different singles (nominated by ChrisTheDude) topped the chart in 1966 and 19 in 1965 (nominated by ChrisTheDude), which was published at the time under the title Hot Country Singles. Chart placings were based on playlists submitted by country music radio stations and sales reports submitted by stores.
In baseball, a home run is credited to a batter when he hits a fair ball and reaches home safely on the same play, without the benefit of an error. One hundred and twenty-seven players (nominated by Bloom6132) have hit a home run in their first at bat of a Major League Baseball (MLB) game to date, the most recent being Keibert Ruiz of the Los Angeles Dodgers on August 16, 2020. George Tebeau and Mike Griffin both hit home runs in their first at bats on April 16, 1887. Both players are recognized as the first player to homer in his first major league at bat because the exact time when each home run was hit is unclear.
Since July 2009, Israeli broadcast monitoring service Media Forest has been publishing four rankings which list the top ten most-broadcast Romanian and foreign songs on Romanian radio stations and television channels separately on a weekly basis. In 2009, eight and eleven singles (nominated by Cartoon network freak) were listed by Media Forest as the most-broadcast tracks on radio and television respectively.
Sigourney Weaver is an American actor, playwright, and producer who first began acting in plays in the early 1970s. Throughout her career she has acted in nearly 40 stage productions (nominated by HAL333). She made her film debut with a minor role in the Woody Allen-directed Annie Hall (1977). Her breakthrough role was as Ellen Ripley in the Ridley Scott-directed Alien (1979). She reprised the role in Aliens (1986), this time helmed by director James Cameron. Her performance netted her a nomination for the Academy Award for Best Actress. She would reprise the role in two more sequels: Alien 3 (1992) and Alien: Resurrection (1997), both of which were not as well-received. Although originally written as a man, Ripley is now regarded as one of the most significant female protagonists in cinema history, and consequently, Weaver is considered to be a pioneer of action heroines in science fiction films
The Mandalorian, an American space Western web television series set in the Star Wars universe created by Jon Favreau and released on Disney+, features an extensive cast of characters (nominated by Hunter Kahn).
The United Nations Educational, Scientific and Cultural Organization (UNESCO) World Heritage Sites are places of importance to cultural or natural heritage as described in the UNESCO World Heritage Convention, established in 1972. Lithuania accepted the convention on 31 March 1992, making its natural and historical sites eligible for inclusion on the list. The first site added to the list was the Vilnius Historic Centre, in 1994. Three further sites were added in 2000, 2004, and 2005. In total, there are four sites on the list (nominated by Tone), all of them cultural. Two sites are transnational: the Curonian Spit is shared with Russia and the Struve Geodetic Arc is shared with nine other countries. In addition to its World Heritage Sites, Lithuania also maintains two properties on its tentative list.
The Boston Marathon, one of the six World Marathon Majors, is a 26.2-mile (42.2 km) race which has been held in the Greater Boston area in Massachusetts since 1897. It is the oldest annual marathon in the world. The event is held on Patriots' Day, the third Monday of April. Various factors meant that until 1957 the course varied in length, due to which the marathon recognizes several course records that are slower than previous records due to being run on longer courses. The first Boston Marathon included only 15 runners, all of whom were men, and was won by John McDermott. The race was cancelled for the first time in its history in 2020, due to the COVID-19 pandemic. The winners (nominated by Harrias) have represented 27 different countries: Americans have won the marathon the most, doing so on 108 occasions; Kenyans have won 34 times; and Canadians 21 times. Ernst van Dyk is the most successful individual athlete, having won the men's wheelchair division ten times. The current course records are held by Geoffrey Mutai, Buzunesh Deba, Marcel Hug and Manuela Schär.
The WCW Light Heavyweight Championship (nominated by Grapple X) was a professional wrestling championship that was contested in World Championship Wrestling (WCW) between 1991 and 1992. Conceived in 1991, the championship was first awarded as the result of a single-elimination tournament; its subsequent lineage ended when the final champion Brad Armstrong was stripped of the title due to injury. A second tournament to decide Armstrong's successor was announced, but never took place. The title was held by four different champions; the inaugural champion Brian Pillman was the only wrestler to win it on more than one occasion. The light heavyweight division which contested the championship had proved popular with fans, but its viability suffered as a result of WCW's creative decisions; in 1992, Bill Watts became the head booker, and implemented storyline changes in WCW's product which stymied the division's style. WCW would later introduce a similar title as the WCW Cruiserweight Championship; the two titles are now considered one and the same by the wrestling promotion WWE, which purchased WCW's assets in 2001.
The Wikimedian of the Year (nominated by CAPTAIN MEDUSA) is an annual award that honors Wikipedia editors to highlight major achievements within the Wikimedia movement. The award was established in August 2011 by Wikipedia's co-founder Jimmy Wales, who selects the recipients and honors them at Wikimania, an annual conference of the Wikimedia Foundation. From 2011 to 2017 the award was named 'Wikipedian of the Year'. The award includes prize money, which as of 2020 is $5,000.
Clark Gable (1901–1960) was an American actor and producer who appeared in over 70 feature films and several short films (nominated by HAL333). Gable first began acting in stage productions, before his film debut in 1924. After many minor roles, Gable landed a leading role in 1931, subsequently becoming one of the most dominant leading men in Hollywood. He often acted alongside reoccurring leading ladies: six films with Jean Harlow, six with Myrna Loy, eight with Joan Crawford, and four with Lana Turner, among many others. He is widely regarded as one of the greatest actors in cinematic history.
The DHL Fastest Lap Award (nominated by MWright96) is given annually by the courier, Formula One global partner and logistics provider DHL "to recognise the driver who most consistently demonstrates pure speed, with the fastest lap at the highest number of races each season", and to reward the winning driver for "characteristics such as excellent performance, passion, can-do attitude, reliability and precision". First awarded in 2007 by DHL, the trophy's official naming patron, it is presented to the driver with the highest number of fastest laps over the course of the season, with one point awarded to the fastest lap holder of a Grand Prix.
The Hennepin County Library, which serves Hennepin County, Minnesota, including the city of Minneapolis, consists of 41 branches (nominated by Bobamnertiopsis) in 24 cities and towns. Of these, 15 are in Minneapolis; collectively they made up the Minneapolis Public Library until they were absorbed by the Hennepin system in the merger. Four branches (Central, Franklin, Hosmer, and Sumner) were originally founded as Carnegie libraries. Several other libraries, separate from the system, also operate within the county's boundaries.
The Roman Catholic archbishop of New York (nominated by Bloom6132) is the head of the Roman Catholic Archdiocese of New York, who is responsible for looking after its spiritual and administrative needs. As the archdiocese is the metropolitan see of the ecclesiastical province encompassing nearly all of the state of New York, the Archbishop of New York also administers the bishops who head the suffragan dioceses of Albany, Brooklyn, Buffalo, Ogdensburg, Rochester, Rockville Centre and Syracuse. The current archbishop is Timothy M. Dolan.
The NWA World Welterweight Championship (nominated by MPJ-DK) is an inactive professional wrestling championship governed by the National Wrestling Alliance (NWA) and most recently promoted by NWA Mexico.
Natalie Wood (1938–1981) was an American actress who started her career as a child by appearing in films directed by Irving Pichel. Wood's first credited role was as an Austrian war refugee in the Pichel-directed Tomorrow Is Forever (1946) with Claudette Colbert and Orson Welles. The following year, she played a child who does not believe in Santa Claus in the Christmas comedy-drama Miracle on 34th Street (1947) opposite Maureen O'Hara, John Payne, and Edmund Gwenn.She has since appeared in numerous films (nominated by Cowlibob), was the recipient of four Golden Globes, and received three Academy Award nominations.
The John Arlott Cup for the PCA Young Player of the Year (nominated by Harrias) is an annual cricket award presented to the player who is adjudged to be the most promising young player in English county cricket. Only players that are aged under 24 on 1 April of the awarding year are eligible for the prize. Michael Atherton was the first winner of the award in 1990. Two players, Kabir Ali and Alastair Cook, have won the award twice, both doing so in successive years; Ali in 2002 and 2003, and Cook in 2005 and 2006. Representatives of thirteen of the eighteen first-class counties have won the award. Yorkshire players have collected the most awards, doing so on six occasions.
Brad Pitt is an American actor and film producer who has received various awards and nominations (nominated by CAPTAIN MEDUSA), including two Academy Awards, two British Academy Film Awards, and two Golden Globe Awards. He has been nominated for an additional five Academy Awards.
Timeline of Mary Pickford (nominated by Jimknut) Mary Pickford (1892–1979) was a Canadian motion picture actress, producer, and writer. During the silent film era she became one of the first great celebrities of the cinema and a popular icon known to the public as "America's Sweetheart".
The Archbishop of Montreal (nominated by Bloom6132)is the head of the Roman Catholic Archdiocese of Montreal, who is responsible for looking after its spiritual and administrative needs. This archdiocese is the metropolitan see of the ecclesiastical province encompassing the south-central part of the province of Quebec, and so the Archbishop of Montreal also administers the bishops who head the suffragan dioceses of Joliette, Saint-Jean-Longueuil, Saint-Jérôme, and Valleyfield. The current archbishop is Christian Lépine.

Featured pictures

20 featured pictures were promoted this month.

Poster for the première of Jules Massenet's opera Ariane. Colour lithograph, 0.87 x 0.61 m (About 34 x 24 inches)(nominated by Adam Cuerden)
Connor Barth, a placekicker for the Tampa Bay Buccaneers, prepares to kick a field goal during the first quarter of the Bucs v. New York Giants National Football League military appreciation game at Raymond James Stadium in Tampa, Fla., Nov. 8, 2015. (created by U.S. Air Force, photographer Ned T. Johnston; nominated by Bammesk)
King Girvan Yuddhavikram Shah (restored and nominated by CAPTAIN MEDUSA)
Portrait of the American singer Nina Simone, 1965. (restored and nominated by Bammesk)
Sketch for the set of Act III, Scene 1 of La Esmeralda, an opera by Louise Bertin (restored and nominated by Adam Cuerden)
Jessie Bonstelle (restored and nominated by CAPTAIN MEDUSA)
A Composite Imaginary View of Japan (created by Khalili Collections; nominated by MartinPoulter)
Pygocentrus nattereri Kner, 1858, Red-bellied piranha; Karlsruhe Zoo, Karlsruhe, Germany. (created by Llez; nominated by MER-C)
Blue tiger (Tirumala limniace exoticus) male, Kumarakom, Kerala, India (created and nominated by Charlesjsharp)
Prime minister David Ben Gurion. (created and nominated by Andrew J.Kurbiko)
Purple sea urchin (Sphaerechinus granularis), Madeira, Portugal (created by Poco a poco; nominated by MER-C)
English composer and suffragette Ethel Smyth (1858-1944) (restored and nominated by Adam Cuerden)
Mizrah, papercut (created by Israel Dov Rosenbaum; nominated by Andrew J.Kurbiko)
"George Kleine presents the Cines photo drama Quo Vadis: Lygia Bound to the Wild Bull." Chromolithograph poster for 1913 film. (restored and nominated by Adam Cuerden)
Talamanca hummingbird (Eugenes spectabilis) male, Mount Totumas cloud forest, Panama. (created and nominated by Charlesjsharp)
Chorda filum on top of a layer of soft blanket weed (Cladophora glomerata), coastline of Sweden. (created by W.carter; nominated by Bammesk)
Eurasian coot (Fulica atra) juvenile in France (created and nominated by Charlesjsharp)
Hazel MacKaye, actress, suffragist, and writer of various pageants and plays, many of which for women's suffrage events, as well as working with the Young Women's Christian Association and becoming their Director of Pageantry and Drama. (restored and nominated by Adam Cuerden)
Hortus Haren. (Laetiporus sulphureus) on (Ginkgo biloba). (created by Agnes Monkelbaan; nominated by MER-C)
Lilac-breasted roller (Coracias caudatus caudatus) in Botswana (created and nominated by Charlesjsharp)

Featured topics

Bernardo Strozzi - Claudio Monteverdi (c.1630)

One featured topic was promoted this month.

Operas by Claudio Monteverdi (nominated by Aza24): Claudio Monteverdi (1567–1643) composed ten operas, a genre which emerged while he was court musician in Mantua. His first opera, L'Orfeo, was premiered in 1607 and became the first opera still in today's repertoire. For seven opera projects the music is mostly lost. Four of these were completed and performed, while he abandoned the others at some point. Librettos have survived for some of them, and fragments of the music for L'Arianna and Proserpina rapita. Monteverdi composed operas for a theatre in Venice when he was master of music at San Marco there, including Il ritorno d'Ulisse in patria in 1640 and L'incoronazione di Poppea in 1643, both of which also remain in the repertoire.

Reader comments

2020-08-30

Wikipedia's not so little sister is finding its own way

Contribute —

By Lydia Pintscher

Wikidata is arguably one of Wikipedia's most successful sister projects. It has had a profound impact on Wikipedia in just a few years. Lydia Pintscher is the Product Manager for Wikidata at Wikimedia Germany. This essay was first published at Wikipedia @20 and has been licensed by the author CC-BY SA 3.0

In 2012, Wikipedia had grown and achieved so much in over a decade of creating an encyclopedia. But it was also at a point where fundamental change was needed: The world around Wikipedia was changing and Wikimedia had to find ways to make its content more accessible and support its editors in maintaining an ever increasing body of content in over 250 languages. The vision of a world in which every single human being can freely share in the sum of all knowledge was not achievable in this scattered way.

Ever since 2005 at the very first Wikimania, Wikimedia’s annual conference, one idea kept coming up: to make Wikipedia semantic and thus make its content accessible to machines. Machine-readability would enable intelligent machines to answer questions based on the content and make the content easier to reuse and remix. For example, it was not possible to easily find an answer to the question of what are the biggest cities with a female mayor because the necessary data was distributed over many articles and not machine-readable. Denny Vrandečić and Markus Krötzsch kept working on this idea and created Semantic MediaWiki, learning a lot about how to represent knowledge in a wiki along the way. Others had also started extracting content from Wikipedia, with varying degrees of success, and making the information available in machine-readable form.

So when the first line of code for the software that came to power Wikidata was written in 2012, it was an idea whose time had come. Wikidata was to be a free and open knowledge base for Wikipedia, its sister projects and the world that helps give more people more access to more knowledge. Today, it provides the underlying data for a lot of technology you use and the Wikipedia articles you read every day.

Being able to influence the world around you is such an important and empowering thing and yet we are losing this ability a bit more everywhere every day. More and more in our daily lives depends on data so lets make sure it stays open, free and editable for everyone in a world where we put people before data. Wikipedia showed how it can be done and now its sister Wikidata joins to contribute a new set of strengths.

Growing up

Wikidata always had bigger ambitions, but it started out by focusing on supporting Wikipedia. There were nearly 300 different language versions of Wikipedia, all covering overlapping (but not identical) topics without being able to share even basic data about these topics. Considering that most of these language versions had only a handful of editors, this was a problem. Small language versions were not able to keep up with the ever changing world and, depending on which language you could read, a vast amount of Wikipedia content was inaccessible to you. Perhaps someone famous had died? That information was usually available quickly on the largest Wikipedias but took a long time to be added to the smaller ones — if they even had an article about the person. Wikidata helps fix this problem by offering a central place to store general purpose data (like those found in the infoboxes on Wikipedia, such as the number of inhabitants of a city or the names of the actors in a movie) related to the millions of concepts covered in Wikipedia articles.

To start this knowledge base, Wikidata began by solving a simple but long-standing problem for Wikipedians, the headache of links between different language versions of an article. Each article contained links to all other language versions covering the same topic but this was highly redundant and caused synchronisation issues. Wikidata’s first contribution was to store these links centrally and thereby eliminate needless duplication. With this first simple step, Wikidata helped eliminate over 240 million lines of unnecessary wikitext from Wikipedia and at the same time created pages for millions of concepts on Wikidata, providing the basis for the next stage. Once the initial set of concepts were created and connected to Wikipedia articles, it was time for the actual data to be added, introducing the ability to make statements about the concepts (e.g. Berlin is the capital of Germany). After that, last but not least, came the capability to use this data in Wikipedia articles. Now Wikipedia editors could enrich their infoboxes automatically with data coming from Wikidata.

Along the way, a fantastic community maintaining that data developed, much faster than the development team could have dreamed. This new community included new people who had never contributed to a Wikimedia project before and were now becoming interested because Wikidata was a good fit for them. It also included contributors from adjacent Wikimedia projects who were more interested in structuring information than writing encyclopedic articles and found their calling in Wikidata.

Later, Wikidata's scope expanded to support other Wikimedia projects, such as Wikivoyage, Wikisource, and Wikimedia Commons, allowing them to benefit from a centralized knowledge base as Wikipedia did.

As it evolved, Wikidata became an attractive source for Wikimedia projects and those who used to data-scrape Wikipedia infoboxes. External websites, apps, and visualisations used this information as a basic ingredient: from a website for browsing artwork, to book inventory managers, to history teaching tools, to digital personal assistants. Now, Wikidata is used in countless places without most users even being aware of it.

Most recently, it became clear that we need to think beyond Wikidata to a large network of knowledge bases running the same software (Wikibase) to publish data in an open and collaborative way, called the Wikibase ecosystem. In this ecosystem, many different institutions, activists and companies are opening up their data and making it accessible to the world by connecting it with Wikidata and among each other. Wikidata doesn't need to be and shouldn't be the only place where people collaborate to produce open data.

At the time of writing of this chapter, Wikidata provides data about more than 55 million concepts. It includes data about such things as movies, people, scientific papers and genes. Additionally, it provides links to over 4,000 external databases, projects and catalogs, making even more data accessible. This data is added and maintained by more than 20,000 people every month and used in over half of all articles in Wikimedia projects.

Helping people (and machines) come together

Just like Wikipedia is not like any other encyclopedia, Wikidata is not like any other knowledge base. There are a number of things that set Wikidata apart. They are a result of striving to be a global knowledge base and covering a multitude of topics in a machine-readable way.

The most important differentiator is probably the acknowledgement that the world is complex and can’t easily be pressed into simple data. Did you know that there is a woman who married the Eiffel Tower? That the Earth is not a perfect sphere? A lot of technology today is trying to simplify the world by hiding necessary complexity and nuance. Conflicting worldviews need to be surfaced. Otherwise we take away people’s ability to talk about, understand, and ultimately resolve their differences. Wikidata is striving to change that by not trying to force one truth but by collecting different points of view with their sources and context intact. This additional context can, for example, include which official body disputes or supports which view on a territorial dispute. Without this focus on verifiability instead of truth and not trying to force agreement it would be impossible to bring together a community from different languages and cultures. For the same reason, Wikidata doesn’t have an enforced schema that restricts the data, but, rather, has a system of editor-defined constraints that highlight potential problems.

Being able to cover different points of view and nuance is not enough however for a truly global project. The data also needs to be accessible to everyone in their language without privileging any particular language by design. Because of this, every concept in Wikidata is identified by a unique ID instead of an English name. Q5, for instance, is the identifier for the concept of a human. It is then given labels in the different languages: “human” in English, “người” in Vietnamese and “ihminen” in Finnish. This way the underlying data is language-independent and everyone can see the data in their language when viewing or editing it. This of course does not eliminate the language issue but it goes a long way towards more equity in contributing to Wikimedia’s content.

Besides fabulous people, Wikidata’s ultimate secret sauce are its connections. All concepts in Wikidata are connected to each other through statements. The statement “Iron Man -> member of -> Avengers” for example tells us that Iron Man is a member of the Avengers. That one connection alone does not tell us much yet. But if you take a number of other similar connections you can easily get a list of all Avengers. And then make a list of the movies they first appeared in and the actors they were portrayed by. A lot of simple individual connections taken together are powerful. If you add on top of that the wide range of topics Wikidata covers it becomes even more powerful because you can make connections that have not been made before. How about a list of species named after politicians? Now possible, thanks to these simple connections! And those are just the connections inside Wikidata itself; Wikidata also connects to a large amount of external databases, catalogs and projects that make even more data available. Since Wikidata has such a large number of links to external resources it can act as a hub so that way you, and even more importantly any machine, can find a vast amount of additional information based on a single piece of data. If the ISBN of a book is known, then knowing its entry in the relevant national library is just a hop away. There might not be a direct link from an artist’s entry in the Louvre’s catalog to their entry in the Rijksmuseum’s catalog but with Wikidata this connection is easily made, opening up yet more options for discovering knowledge.

Impacting Wikipedia

Its close connection to Wikipedia made all the difference for Wikidata, especially at the start. Without the community, experience, mindshare and tools that Wikipedia provided, Wikidata would not be where it is today. Wikidata gained a lot from its close association with Wikipedia. It is also giving back of course, not just by significantly lowering maintenance burdens through centralisation of data but also in a number of more subtle and indirect ways.

Before Wikidata the different Wikimedia projects and language versions of each project worked in silos to a large degree. There was little collaboration on content across project and language boundaries. Wikimedia Commons had been around for a while as a central repository for media files that are shared between all Wikimedia projects, but by its nature it did not force a lot of collaboration. Because of this a large part of the editors associated first and foremost with their language version of Wikipedia and only a distant second, if at all, with the Wikimedia Movement as a whole. Statements like “The Wikipedia in this and that language is terrible” were not uncommon when Wikidata started. The thought of using content that is shared with these other Wikipedias that were perceived as inferior was deemed frightening. Equally, the thought that the large Wikipedias could gain anything from contributions by smaller projects was unthinkable. By helping people connect across language and project boundaries, Wikidata has helped to steer Wikipedia away from a silo mentality towards a truly global movement where every project is recognized and valued for their contribution to the sum of all knowledge.

Wikidata also helps Wikipedia by being a fundamental building block for technical innovation - big and small. Simple changes like the improved search box when linking to another article in VisualEditor become possible thanks to structured data in Wikidata. Now the selector shows you the short description from Wikidata and you can select the right article to link to without having to look it up. Wikidata also makes possible more fundamental changes like overhauling Wikimedia Commons in order to make images more discoverable for Wikipedia editors and others. Wikidata provides the data necessary to build better experiences for Wikipedia’s editors and readers.

Through the data in Wikidata we can also understand Wikipedia better. We can analyse much more easily what content is covered and what is missing. Take the gender gap. It was known for a long time that Wikipedia’s content is skewed towards covering men. The simple fact that there are more Wikipedia articles about men than women is not very helpful for a big community though as it is too broad a problem to be motivated by and meaningfully make progress on. Wikidata allows us to see a more detailed picture and analyse the content by time period, country, profession of the person and other relevant characteristics. We can also see if there is a difference between the language versions of Wikipedia to see if any of them has a particularly narrow gender gap so we can learn from them. We can also see the geographic distribution of Wikipedia’s content and find blind spots on Wikipedia’s map of the world. The same can be done for any other content bias or gap that needs to be understood better. This way, Wikidata helps Wikipedia learn more about itself.

Better understanding the knowledge that Wikipedia covers is a necessary first step towards countering biases and filling gaps. Wikidata can also help there by making it possible to generate automated worklists for a topic you care about. Interested in video games? You can make a list of all video games released in the last 10 years which are missing a publisher and start adding that data. How about party affiliations of politicians in your recent local election? Monuments in the city you last visited that are missing street addresses? All that is just a few clicks away, making it easier to contribute to collecting the sum of all human knowledge and making Wikipedia more complete.

And last but not least, Wikidata helps bring new contributors to Wikipedia. It opens up Wikimedia to new types of people, ones more interested in structuring information and connecting data points than writing long prose. And the small contributions that can be made on Wikidata lend themselves well to beginners who are initially overwhelmed by writing full articles. It also is a gateway for institutional contributors like galleries, libraries, archives and museums who want to make their content accessible.

Wikidata’s influence on Wikipedia far exceeds simply providing a few data points for infoboxes. It is a driver and supporter of change. Growing up with a big sister is not always easy. There’s the occasional disagreement and even fight but in the end you make up and stick together because you are the best team there could be. It is amazing to have someone to look up to. Wikidata is a project in its own right now, with its own reason for existence… but it will always be there to support Wikipedia.

Thank you, big sister! Wikidata owes you.

Reader comments

2020-08-30

The longest-running hoax

Contribute —

By Enwebb

Enwebb is the organizer of WikiProject Bats and founder of the Tree of Life Newsletter.

On August 7, WikiProject Palaeontology member Rextron discovered a suspicious taxon article, Mustelodon, which was created in November 2005. The article lacked references and the subsequent discussion on WikiProject Palaeontology found that the alleged type locality (where the fossil was first discovered) of Lago Nandarajo "near the northern border of Panama" was nonexistent. In fact, Panama does not even really have a northern border, as it is bounded along the north by the Caribbean Sea. No other publications or databases mentioned Mustelodon, save a fleeting mention in a 2019 book that presumably followed Wikipedia, Felines of the World.

The article also appeared in four other languages, Catalan, Spanish, Dutch, and Serbian. In Serbian Wikipedia, a note at the bottom of the page warned: "It is important to note here that there is no data on this genus in the official scientific literature, and all attached data on the genus Mustelodon on this page are taken from the English Wikipedia and are the only known data on this genus of mammals, so the validity of this genus is questionable."

This is not a Mustelodon.

Editors took action to alert our counterparts on other projects, and these versions were removed also. As the editor who reached out to Spanish and Catalan Wikipedia, it was somewhat challenging to navigate these mostly foreign languages (I have a limited grasp of Spanish). I doubted that the article had very many watchers, so I knew I had to find some WikiProjects where I could post a machine translation advising of the hoax, and asking that users follow local protocols to remove the article. I was surprised to find, however, that Catalan Wikipedia does not tag articles for WikiProjects on talk pages, meaning I had to fumble around to find what I needed (turns out that WikiProjects are Viquiprojectes in Catalan!) Mustelodon remains on Wikidata, where its "instance of" property was swapped from "taxon" to "fictional taxon".

How did this article have such a long lifespan? Early intervention is critical for removing hoaxes. A 2016 report found that a hoax article that survives its first day has an 18% chance of lasting a year.^[1] Additionally, hoax articles tend to have longer lifespans if they are in inconspicuous parts of Wikipedia, where they do not receive many views. Mustelodon was only viewed a couple times a day, on average.

Mustelodon survived a brush with death three years into its lifespan. The article was proposed for deletion in September 2008, with a deletion rationale of "No references given; cannot find any evidence in peer-reviewed journals that this alleged genus actually exists". Unfortunately, the proposed deletion was contested and the template removed, though the declining editor did not give a rationale. Upon its rediscovery in August 2020, Mustelodon was tagged for speedy deletion under CSD G3 as a "blatant hoax". This was challenged, and an Articles for Deletion discussion followed. On 12 August, the AfD was closed as a SNOW delete. WikiProject Palaeontology members ensured that any trace of it was scrubbed from legitimate articles. The fictional mammal was finally, truly extinct.

At the ripe old age of 14 years, 9 months, this is the longest-lived documented hoax on Wikipedia, topping the previous documented record of 14 years, 5 months, set by The Gates of Saturn, a fictitious television show, which was incidentally also discovered in August 2020. Based on the edit history of List of hoaxes on Wikipedia, new hoaxes are identified regularly at English Wikipedia. Dealing with this hoax and its fallout left me ruminating over some questions: How can we better identify hoaxes to keep them from reaching their tenth (or even fifteenth) birthdays? How can Wikipedia co-ordinate more readily across its different language versions once a hoax is discovered in one language? Does English Wikipedia harbor hoaxes that have been deleted elsewhere? Happy to hear your ideas.

References

^ Kumar, Srijan; West, Robert; Leskovec, Jure (April 2016). "Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes" (PDF). Proceedings of the 25th International World Wide Web Conference: 591–602. doi:10.1145/2872427.2883085.

Reader comments

2020-08-30

Heart, soul, umbrellas, and politics

Contribute —

By Igordebraga, Kingsif, Mcrsftdog, Rebestalic

This traffic report is adapted from the Top 25 Report, prepared with commentary by Igordebraga (July 26 to August 22), Kingsif (July 26 to August 16), Mcrsftdog (July 6 to August 1, August 9 to August 22) and Rebestalic (August 16 to 22)

Give me time and give me space. Give me real don't give me fake. Give me a cure for the COVID-19 pandemic that can't leave soon enough (to the point the view counts for that article are dropping...). And for those who prefer in those troubled quarantined times to move onto another "diseased" subject, tell me your own politik.

(data provided by the provisional Top 1000 report)

Give me heart and give me soul (July 26 to August 1)

Most Popular Wikipedia Articles of the Week (July 26 to August 1, 2020)

Rank	Article	Views	Notes/about
1	John Lewis (civil rights leader)	1,507,358	The funerary services befitting such a figure as Congressman Lewis took place this week. After his funeral he lay in state at first in the Alabama State Capitol, and then the United States Capitol rotunda on Monday and Tuesday, the first African-American lawmaker to receive the honor. A second funeral ceremony was held in Atlanta on Thursday, where he was eulogized by former Presidents Clinton, W., and Obama, and he rests in Atlanta's South-View Cemetery. Lewis died on July 17, and now doubles the views his article had last week during a strangely slow period for Wikipedia, appearing on here for three consecutive weeks, unusual for a recent death: more unusual is only hitting #1 in the third week, which he does now thanks to many redirects for his common name.
2	Regis Philbin	1,505,819	American television has lost enough stars old and young this year to fill out several montages at the upcoming Emmys, but the most prominent is probably Regis, who died last week and now overtakes all the Sushant Singh Rajput-related entries. Whether it be every game show you can think of or the morning talk show named after him for over 20 years, just about every American (and a sizable number of people from around the world) has seen him host despite pulling back due to poor health in the 2010s. This poor health led to his fatal heart attack on July 24.
3	Olivia de Havilland	1,448,864	After Kirk Douglas in February, another centenarian from Hollywood's Golden Age leaves us with the passing of Dame Olivia Mary de Havilland, winner of an Academy Award for To Each His Own (only Luise Rainer, who almost got to her 105 birthday, lived longer among Oscar winners). De Havilland was also involved in classics such as The Adventures of Robin Hood and Gone with the Wind.
4	Herman Cain	1,331,901	Cain, a businessman who was once considered a front runner for the 2012 Republican nomination, died of COVID-19 complications on Thursday. He was hospitalized on June 1, only 9 days after attending a Trump rally maskless. Cain's death should be seen as a cautionary tale for the anti-mask movements. It won't, but it should.
5	Shakuntala Devi	1,097,470	The first Indian figure on the list this week is Devi, author of The World of Homosexuals which, fascinating as it sounds and groundbreaking as it was, is unrelated. Devi was best known as a human calculator (or the human calculator, so was her fame) and her amazing mind earned her an official Guinness World Record... in 1980. She died in 2013, and was only presented with the record this week, despite appearing in the GWR book. She's also the subject of a recent biopic, released Friday on Prime Video.
6	Rhea Chakraborty	1,095,924	Chakraborty was first reported as Sushant Singh Rajput's girlfriend after the latter committed suicide. On the 25th, the deceased's father filed a First Information Report, accusing her (and many others) of theft and abetting suicide for allegedly threatening Singh Rajput by saying he should be declared mentally unwell. She was arrested this past Tuesday.
7	Deaths in 2020	921,476	No I don't want to battle from beginning to end I don't want a cycle of recycled revenge I don't want to follow Death and All His Friends!
8	The Umbrella Academy (TV series)	686,289	Netflix released the much-anticipated second season adapting the comics written by musician Gerard Way and drawn by Gabriel Bá (pictured), where the remaining kids of a superpowered "family" time travel to prevent an apocalypse. "Family" being in inverted commas thanks to adoption that allowed for diverse casting: among its popular main cast are a British actor, an Irish actor, a Canadian, a teenager, and one of the original Broadway cast of #14's musical.
9	Dil Bechara	664,134	Director Mukesh Chhabra's (pictured) take on the teenage cancer of teenage cancer books, The Fault in Our Stars, was released for free streaming on Disney+ Hotstar on July 24, and was reportedly viewed 85 million times in its first 24 hours. It's either still getting hype or has been dragged into the new scandal (#6) about main actor Sushant Singh Rajput's suicide.
10	Jacob Elordi	632,000	This young Australian actor has seen a sudden rise to prominence thanks to his leading roles in two major franchises: TV's Euphoria and the Netflix movies about a kissing booth co-starring Joey King that are getting a lot of coverage at the moment. The second of the films was released this week.

Wounds that heal and cracks that fix (August 2 to 8)

Most Popular Wikipedia Articles of the Week (August 2 to 8, 2020)

Rank	Article	Views	Notes/about
1	Lebanon	1,588,673	A small country beset by war and tragedy this week saw its capital city (#6) destroyed (#3) in a big explosion caused by incompetence (#5). Though not nuclear, the size and appearance of the mushroom cloud that resulted in earthquakes in mainland Europe has been likened to some notable bombings.
2	The Umbrella Academy (TV series)	1,538,754	Season 2 of the mystery superhero drama arrived on Netflix. Ellen Page (pictured) stars in it as Vanya, who is doing a hell of a lot better than in season 1. Page is also from Canada, where the show is filmed, and according to co-star Emmy Raver-Lampman she would take other castmembers out to local places while filming.
3	2020 Beirut explosions	1,207,762	In the port of Beirut (#6), capital of Lebanon (#1), there was a warehouse that since 2014 housed dangerous chemicals (#5) taken from an abandoned ship. On August 4, a fire broke in said warehouse, leading to a blast that wrecked buildings in a 10 kilometer (6 miles) radius.
4	Shakuntala Devi	1,178,421	The subject of a new film from Amazon Prime, where she's portrayed by Vidya Balan (pictured). While Netflix is going action, Amazon has decided to go... math.
5	Ammonium nitrate	1,089,158	Ammonium nitrate is a highly unstable substance that has caused some big explosions, like #14 and #3, the latter of which turned Beirut, capital of #1, into rubble this week.
6	Beirut	961,178
7	Deaths in 2020	858,347	Will you defeat them Your demons and all the non-believers? The plans that they have made? Because one day, I'll leave you A phantom to lead you in the summer To join The Black Parade
8	Rhea Chakraborty	691,270	How's this for Bollywood drama: Chakraborty, the girlfriend of the late Sushant Singh Rajput, was originally arrested last week for something related to his suicide, but is now being investigated for money laundering. In a shocking turn of events in this whole suicide scandal, Singh Rajput's best friend and fellow Bollywood star, Sharma, killed himself this week.
9	Samir Sharma	665,074
10	Wilford Brimley	618,624	A moderately famous actor and sometime singer, Brimley is also the person who caused half of North America to pronounce diabetes as "diabeetus" – he was diagnosed with the condition in the 1970s and became a prominent campaigner, but one with a mountain accent. He died on August 1 from what appears to be a diabetes-related kidney problem.

Tell me all your politik (August 9 to 15)

Most Popular Wikipedia Articles of the Week (August 9 to 15, 2020)

Rank	Article	Views	Notes/about
1	Kamala Harris	11,843,595	California lawyer and senator who was announced this week as the Democrat VP pick with running-mate #9. She was a popular choice, had a brief presidential campaign last year, and brings the rest of her family to the list. In the days after her selection, birtherism was reborn: though she was definitely born in California, with an American father, she is not white, which is enough to send certain people into discredit mode.
2	Shyamala Gopalan	1,851,954	As a result of #1 being chosen as a VP candidate, attention was brought in for the whole family – in order, her mother, her sister (above), her father, and her husband (below).
3	Maya Harris	1,644,390
4	Donald J. Harris	1,640,562
5	Douglas Emhoff	1,427,685
6	QAnon	1,370,205	Marjorie Taylor Greene, a vocal supporter of Q, won a primary to a safe seat in the United States House of Representatives on Tuesday. Trump twote in support the next morning, leading to a question in a briefing. Trump sidestepped it, without mentioning Q.
7	The Umbrella Academy (TV series)	986,180	Netflix released the second season of this a little while ago, setting the apocalypse in Dallas. The moral of the story seems to be that even when you try really hard, you can still get everything wrong? That, or join a cult.
8	Joe Biden	836,439	While current president Trump has spent a lot of time on this list, the Democrats are presently occupying a lot of the top 10. Biden is Trump's competition as the countdown to November's election continues. He picked a running mate, #1, this week.
9	Gunjan Saxena	814,894	An Indian female air force pilot, a movie about her life (where she's played by actress Janhvi Kapoor, pictured) was released August 12 on Netflix.
10	Deaths in 2020	813,025	They call me The Seeker I've been searching low and high I won't get to get what I'm after Till the day I die

And open up your eyes (August 16 to 22)

Most Popular Wikipedia Articles of the Week (August 16 to 22, 2020)

Rank	Article	Views	Notes/about
1	Kamala Harris	2,523,180	The 2020 Democratic National Convention was a four day television event taking place from Monday to Thursday, with an average audience of 21.6 million viewers. While the real stars of the show were Biden and Harris, viewers got to see appearances from all of their favorite characters from the Democratic primaries, and even a few teasers for the 2024 arc.
2	Joe Biden	1,852,528
3	QAnon	1,379,518	QAnon stands alone as the only major conspiracy theory that's supportive of the government. Imagine if David Icke thought there were lizards controlling everything and he openly campaigned to become one of them. Imagine somone thinking that the CIA killed Kennedy but also thanking them for it. Bizarre.
4	Jill Biden	1,252,629	#2's wife (and potential First Lady) appeared in a pre-taped video at the DNC on Tuesday night, talking about how capable of a president her husband would be.
5	Elon Musk	980,785	In my skim of the news, Musk is doing something in Texas and has a new brain chip?
6	Donald Trump	847,079	Is seeking re-election.
7	Deaths in 2020	800,303	And when you're gone, who remembers your name? Who keeps your flame? Who tells your story?
8	Beau Biden	775,600	The last night of the DNC featured a tribute to the late son of #2 and Neilia Hunter, who died of brain cancer in 2015.
9	Ronald Koeman	767,192	FC Barcelona isn't what it used to be: when faced with Bayern Munchen in the shortened\empty 2019–20 UEFA Champions League knockout phase, the usually victorious Spanish squad received an 8-2 thumping! Such a humiliation led to the dismissal of their coach, and in comes a Dutchman who was an old idol of the team, Ronald Koeman, most recently manager of his country's national team.
10	Betty Broderick	749,527	Netflix released season 2 of Dirty John, which tells some of Broderick's story – she, played there by Amanda Peet (pictured), killed her ex-husband and his new wife in 1989, and is still in jail for it.

Exclusions

These lists exclude the Wikipedia main page, non-article pages (such as redlinks), and anomalous entries (such as DDoS attacks or likely automated views). Since mobile view data became available to the Report in October 2014, we exclude articles that have almost no mobile views (5–6% or less) or almost all mobile views (94–95% or more) because they are very likely to be automated views based on our experience and research of the issue. Please feel free to discuss any removal on the Top 25 Report talk page if you wish.

Reader comments

2020-08-30

Fourteen things we’ve learned by moving Polish Wikimedia conference online

Contribute —

By Natalia Szafran-Kozakowska

Natalia Szafran-Kozakowska is the community support officer for Wikimedia Polska She originally posted this essay on Diff, (part 1) (part 2), a new project hosted by the Wikimedia Foundation for the Wikipedia community. You can join Diff here.

Every year Polish Wikimedians convene to feel the human touch of the movement, and meet at conferences to learn, discuss and work together. This annual meeting, which gathers about 100 Wikimedians every year, is a great celebration of our community, movement and mission. When the COVID pandemic made it impossible for us to meet in person we decided that we would move the event online. And with that decision we started quite an adventure! Since online meetings are here to stay for a bit we would like to share some of the lessons we have learned.

Do not replicate offline routines. You may be experienced in organizing live events, but the digital environment, the amount of things that you can control, and the needs of your participants are different. Make a list of things that need to happen for your event to be successful and then ask yourself how can you make sure they do happen in the new environment. Think not only about big things (“people need to learn something useful”) but also about tiny ones (“people need to be in the right place at the right time”). Be creative! For example we stated that wellbeing of the attendees is a factor. This is why we had a lot of breaks so that everyone could step away from their devices, and a yoga session to bring some care to our tired spines.
But in some aspects – do. Especially if you replace a regular event which had its place in people’s calendars with a digital one. Bring a bit of a feeling of an in-person conference to give a sense of continuity. We knew that our attendees were excited about the fact that the conference was supposed to take place in Cracow. This is why we organized a remote guided live city tour. We were able to enjoy the views and ask questions. We also had a group photo (instead of a typical group screen capture we went for a collection of selfies which made the photo more vibrant). As a replacement of coffee breaks, we sent chocolates to the participants. Also, in the registration process, attendees could choose whether they want a physical surprise package sent to their home or a digital one to download.
Make it simple, and avoid adding confusion. Virtual events are still new for a lot of people. Participants need to know where and when to click, where to seek information and whom to ask for help. Keep as much information as you can on one page and, if possible, hold all (or most) of the sessions on just one or two links so that whenever the participants click, they will get to the conference room. Have a person and a separate communication channel (in our case, it was a Telegram group) assigned to give technical information and support.
The time can get tricky. While facilitating a conference and making sure that everything is on time is a challenge, it is much more difficult at a digital event. The speakers can go over their assigned time and can easily miss (or even ignore on purpose) cues from the moderator. Muting a person while they speak is neither elegant nor kind. So instead, plan breaks a bit (5 minutes) longer than you actually want them to be – it will give you a time buffer and will let participants have time to re-energize even if the session gets a bit too long. You will also have flexibility to allow an interesting conversation to continue. Keep the buffer secret from the panelists or speakers, though, so that they won’t treat it as an actual session time.
Think of all the things in which online conferences are better than live ones. And then make the most of it! Are there any people whom you’ve always wanted to invite but never could because of geographical distance or language barriers? Now it is possible! We took advantage and invited guest speakers from across the ocean and broadened our pool of participants by offering simultaneous translation. This way we could have attendees from all over the globe: from Russia to Sweden and from Ukraine to the U.S.! Online events give you a unique chance to broaden your audience and invite people outside of the Wikimedia Movement. We promoted our speakers using social media to boost interest from non-Wikimedians and invite them to our event.
Why so serious? To the participants, we sent conference packages including a pair of comfortable home slippers and a door hanger saying “Do not disturb, I’m attending a conference” so that we could add some humour to the fact that the conference has unexpectedly moved to participants’ homes.
Conference platforms – remember your priorities. Choosing a platform is not easy. Make a list of functionalities you need and put them in hierarchical order so that you will know: how important it is to you that the tool is open source? What feature is only nice to have? For example, Wikimedians use a very diverse set of browsers, so for us, having a tool that works on many different ones was a criterion.
Test your conference platform, learn its constraints, and let the speakers test it again. Test it in different groups and in different technical conditions (browsers, devices, and so forth). Shortly before the event we decided to shift the conference to a different platform because the one we had planned had shortcomings that were a no-go for us. You may schedule a get-together for the speakers the day before – it will help everyone get acquainted with the tool before the serious work begins.
From the attendees’ perspective, remote participation is less of a logistical effort. This extends to the period way before the event. In our case, participants (speakers, too) were often way less strict in honoring their commitments than they are at live events. They kept us waiting longer for their decisions about participating. They submitted the details of their talks later than they usually do.
Plan a lot and prepare your speakers. If you are having a scenario for a live session panel, discuss “theme entries” (and the amount of them) with your guests earlier. It keeps you within the schedule, and makes everything less stressful!
People need to move. And to take breaks. Sitting in front of the computer is much more tiring than being in a conference room. Which means: less session time, more breaks. We went for a 1-hour session/30-minute break schedule with one long (2 hours) lunch break and it was a perfect amount of time to keep everyone focused and well.
Diversify your program. Don’t make it a series of webinars. Shift between discussions and lectures, workshops and panel discussions. Changing format will help your attendees keep their focus. We made a mistake of scheduling social activities in the late afternoon when people were tired. In retrospect it would be better to plan them during the day.
Be flexible. Not all our ideas went as planned. And it was OK. Rather than pushing them we followed our participants’ needs. We wanted to provide a place for conversations so we opened a participants Telegram group (a solution which worked perfectly during our live events) but people preferred to use the Zoom chat and Telegram became more of a place for announcements. We planned a Wikipedia scavenger hunt for the evening but people preferred to socialize by chatting. If your goals are met in a different way than the one you have planned, who cares! As long as they are met, right?
Embrace the fact that things will go wrong. Because some will. The internet can go down, cats may jump on keyboards, the mics and the cameras may not cooperate, the speaker’s neighbours can decide to drill in their walls. There is a lot that can go wrong and not a lot of things you can control. Accept that the event doesn’t need to be perfect to be awesome. It’s not about perfection, it’s about connecting with each other. If obstacles come up, communicate it clearly to your participants and stay kind to yourself even if things go wrong. As long as you have that last one going – everything will be fine. Because kindness is the most important force in the Wikiverse!

And because of that I would like to thank my teammates Wojciech, Klara and Szymon with helping me with their insight in bringing all those learnings together!

Reader comments

2020-08-30

Detecting spam, and pages to protect; non-anonymous editors signal their intelligence with high-quality articles

Contribute —

By Matthew Sumpter and Tilman Bayer

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

"Protecting the Web from Misinformation" by detecting Wikipedia spammers and identifying pages to protect

Reviewed by Matthew Sumpter

This book chapter ^[1] discusses general trends in misinformation on the web. Misinformation can take many forms including vandalism, spam, rumors, hoaxes, counterfeit websites, fake product reviews, clickbait, and fake news. The chapter briefly describes each subtopic and presents examples of them in practice. The following section details a comprehensive set of NLP and network analysis studies that have been conducted both gain further insight into each subtopic, as well as combat them.

The chapter concludes with a case study based on the authors' research to protect Wikipedia content quality. The open editing mechanism of Wikipedia is ripe for exploitation by bad actors. This occurs mainly by vandalism, but also through page spamming and the dissemination of false information. To combat vandalism, the authors developed the "DePP" system, which is a tool for detecting which Wikipedia article pages to protect. DePP achieves 92.1% accuracy across multiple languages in this task. This system is based on the following base features: 1) Total average time between revisions, 2) Total number of users making five or more revisions, 3) Total average number of revisions per user, 4) Total number of revisions by non-registered users, 5) Total number of revisions made from mobile devices, and 6) Total average size of revisions. Through careful statistical analysis to determine the standard behavior of these metrics, malicious revisions can be identified by a deviation from these standards.

To combat spam, the authors developed the "Wikipedia Spammer Detector" (WiSDe). WiSDe uses a framework built upon features that research has revealed to be typical of spammers. These features most notably include the size of the edits, the time required to make edits, and the ratio of links to text within the edits. WiSDe achieved an 80.8% accuracy on a dataset of 4.2K users and 75.6K edits - an improvement of 11.1% over ORES. The case study concludes by providing some findings regarding the retention of new contributors to Wikipedia. They proposed a predictive model that achieved a high precision (0.99) in predicting users that would become inactive. This model relies on the observation that active users are more involved in edit wars, edit a wider variety of categories, and positively accept critiques.

See also our earlier coverage of related papers involving the first author: "Detecting Pages to Protect", "Spam Users Identification in Wikipedia Via Editing Behavior"

Editors successfully signal their intelligence by writing high-quality articles - but only when contributing non-anonymously

Reviewed by Tilman Bayer

An article^[2] in the psychology journal Personality and Individual Differences reports on an experiment in a Wikipedia-like wiki, where editors with higher general intelligence scores write higher quality articles (as rated by readers) - but only when contributing non-anonymously. This is interpreted as evidence that contributors successfully "signal" their intelligence to readers (in the sense of signalling theory, which seeks to explain various behaviours in humans and animals that appear to have no direct benefit to the actor by positing that they serve to communicate certain traits or states to observers in an "honest", i.e. difficult to fake fashion).

The authors start out by wondering (like many have before) why "some people share knowledge online, often without tangible compensation", on sites such as Wikipedia, Reddit or YouTube. "Many contributions appear to be unconditionally altruistic and the system vulnerable to free riding. If the selfish gene hypothesis is correct, however, altruism must be apparent and compensated with fitness benefits. As such, our findings add to previous work that tests the costly signaling theory explanations for altruism." (Notably, not all researchers share this assumption about altruistic motivations, see e.g. the preprint by Pinto et al. listed below.)

For the experiment, 98 undergraduate students, who had previously completed the Raven's Advanced Progressive Matrices (RPM) intelligence test, were asked to spend 30 minutes "to contribute to an ostensibly real wiki-style encyclopedia being created by the Department of Communication. Participants were told that the wiki would serve as a repository of information for incoming first-year students and that it would contain entries related to campus life, culture, and academics [...] The wiki resembled Wikipedia and contained a collection of preliminary articles." 38 of the participants were told their contributions would remain anonymous, whereas another 40 "were photographed and told that their photo would be placed next to their contribution", and their names were included with their contribution. (Curiously, the paper doesn't specify the treatment of the remaining 20 participants.) "The quality of all participants' contributions was rated by four undergraduate research assistants who were blind to hypotheses and experimental conditions. [...] The research assistants also judged the contributors' intelligence relative to other participants using a 7-point Likert-type scale (1 Much dumber than average, 7 Much smarter than average)".

The researchers "found that as individuals' scores on Ravens Progressive Matrices (RPM) increased, participants were judged to have written better quality articles, but only when identifiable and not when anonymous. Further, the effect of RPM scores on inferred intelligence was mediated by article quality, but only when signalers were identifiable." They note that their results leave several "important questions" still open, e.g. that "it remains unclear what beneﬁts are gained by signalers who contribute to information pools." Citing previous research, they "doubt a direct relationship to reproductive success for altruism in signaling g in information pools. Technical abilities are not particularly sexually attractive (Kaufman et al., 2014), so it is likely that g mediates indirect ﬁtness beneﬁts in such contexts." It might be worth noting that the study's convenience sample likely differs in its demographics from those of Wikipedia editors, e.g. only 28 of the 98 participating students were male, whereas males are well known to form the vast majority of Wikipedia contributors.

The article is an important contribution to the existing body of literature on Wikipedia editors' motivations to contribute, even if it appears to be curiously unaware of it (none of the cited references contain "Wikipedia" or "wiki" in their title).

Briefly

See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by Tilman Bayer

6.7% of Wikipedia articles cite at least one academic journal article with DOI

From the abstract:^[3]

"we release Wikipedia Citations, a comprehensive dataset of citations extracted from Wikipedia. A total of 29.3M citations were extracted from 6.1M English Wikipedia articles as of May 2020, and classified as being to books, journal articles or Web contents. We were thus able to extract 4.0M citations to scholarly publications with known identifiers -- including DOI, PMC, PMID, and ISBN -- and further labeled an extra 261K citations with DOIs from Crossref. As a result, we find that 6.7% of Wikipedia articles cite at least one journal article with an associated DOI. Scientific articles cited from Wikipedia correspond to 3.5% of all articles with a DOI currently indexed in the Web of Science."

"Science through Wikipedia: A novel representation of open knowledge through co-citation networks"

From the abstract:^[4]

"... the sample was reduced to 847 512 references made by 193 802 Wikipedia articles to 598 746 scientific articles belonging to 14 149 journals indexed in Scopus. As highlighted results we found a significative presence of 'Medicine' and 'Biochemistry, Genetics and Molecular Biology' papers and that the most important journals are multidisciplinary in nature, suggesting also that high-impact factor journals were more likely to be cited. Furthermore, only 13.44% of Wikipedia citations are to Open Access journals."

See also earlier by some of the same authors: "Mapping the backbone of the Humanities through the eyes of Wikipedia"

"Quantifying Engagement with Citations on Wikipedia"

From the abstract:^[5]

"... we built client-side instrumentation for logging all interactions with links leading from English Wikipedia articles to cited references during one month, and conducted the first analysis of readers’ interactions with citations. We find that overall engagement with citations is low: about one in 300 page views results in a reference click (0.29% overall; 0.56% on desktop; 0.13% on mobile). [...] clicks occur more frequently on shorter pages and on pages of lower quality, suggesting that references are consulted more commonly when Wikipedia itself does not contain the information sought by the user. Moreover, we observe that recent content, open access sources, and references about life events (births, deaths, marriages, etc.) are particularly popular."

See also the research project page on Meta-wiki, and a video recording and slides of a presentation in the June 2020 Wikimedia Research Showcase

"Individual Factors that Influence Effort and Contributions on Wikipedia"

From the abstract and paper:^[6]

"... [We] surveyed [Portuguese Wikipedia] community members and collected secondary data. After excluding outliers, we obtained a final sample with 212 participants. We applied exploratory factor analysis and structural equation modeling, which resulted in a model with satisfactory fit indices. The results indicate that effort influences active contributions, and attitude, altruism by reputation, and altruism by identification influence effort. None of the proposed factors are directly related to active contributions. Experience directly influences self-efficacy while it positively moderates the relation between effort and active contributions. [...] To reach [editors registered on Portuguese Wikipedia], we sent questionnaires to Wikimedia Brasil’s e-mail lists, made an announcement in Wikipedia’s notice section, and sent private messages to members through the platform itself."

"Approaches to Understanding Indigenous Content Production on Wikipedia"

From the abstract:^[7]

"We examine pages with geotagged content in English Wikipedia in four categories, places with Indigenous majorities (of any size), Rural places, Urban Clusters, and Urban areas. We find significant differences in quality and editor attention for articles about places with Native American majorities, as compared to other places."

"Tabouid: a Wikipedia-based word guessing game"

This article describes the automatic generation of a Taboo-like game (where players have to describe a word while avoiding a given set of other words), also released as a free mobile app for Android and iOS. From the abstract:^[8]

"We present Tabouid, a word-guessing game automatically generated from Wikipedia. Tabouid contains 10,000 (virtual) cards in English, and as many in French, covering not only words and linguistic expressions but also a variety of topics including artists, historical events or scientific concepts. Each card corresponds to a Wikipedia article, and conversely, any article could be turned into a card. A range of relatively simple NLP and machine-learning techniques are effectively integrated into a two-stage process. "

"Vandalism Detection in Crowdsourced Knowledge Bases"

From the abstract:^[9]

"In this thesis, we [...] develop novel machine learning-based vandalism detectors to reduce the manual reviewing effort [on Wikidata]. To this end, we carefully develop large-scale vandalism corpora, vandalism detectors with high predictive performance, and vandalism detectors with low bias against certain groups of editors. We extensively evaluate our vandalism detectors in a number of settings, and we compare them to the state of the art represented by the Wikidata Abuse Filter and the Objective Revision Evaluation Service by the Wikimedia Foundation. Our best vandalism detector achieves an area under the curve of the receiver operating characteristics of 0.991, significantly outperforming the state of the art; our fairest vandalism detector achieves a bias ratio of only 5.6 compared to values of up to 310.7 of previous vandalism detectors. Overall, our vandalism detectors enable a conscious trade-off between predictive performance and bias and they might play an important role towards a more accurate and welcoming web in times of fake news and biased AI systems."

"SchemaTree: Maximum-Likelihood Property Recommendation for Wikidata"

From the abstract:^[10]

"We introduce a trie-based method that can efficiently learn and represent property set probabilities in RDF graphs. [...] We investigate how the captured structure can be employed for property recommendation, analogously to the Wikidata PropertySuggester. We evaluate our approach on the full Wikidata dataset and compare its performance to the state-of-the-art Wikidata PropertySuggester, outperforming it in all evaluated metrics. Notably we could reduce the average rank of the first relevant recommendation by 71%."

NPOV prevails in Hindi, Urdu, and English Wikipedia articles about the Jammu and Kashmir conflict

From the abstract:^[11]

"This article asks to what degree Wikipedia articles in three languages --- Hindi, Urdu, and English --- achieve Wikipedia's mission of making neutrally-presented, reliable information on a polarizing, controversial topic available to people around the globe. We chose the topic of the recent revocation of Article 370 of the Constitution of India, which, along with other recent events in and concerning the region of Jammu and Kashmir, has drawn attention to related articles on Wikipedia. This work focuses on the English Wikipedia, being the preeminent language edition of the project, as well as the Hindi and Urdu editions. [...] We analyzed page view and revision data for three Wikipedia articles [on the English Wikipedia, these were Kashmir conflict, Article 370 of the Constitution of India, and Insurgency in Jammu and Kashmir ]. Additionally, we interviewed editors from all three Wikipedias to learn differences in editing processes and motivations. [...] In Hindi and Urdu, as well as English, editors predominantly adhere to the principle of neutral point of view (NPOV), and these editors quash attempts by other editors to push political agendas."

See also the authors' conference poster

References

^ Spezzano, Francesca; Gurunathan, Indhumathi (2020). "Protecting the Web from Misinformation". In Mohammad A. Tayebi; Uwe Glässer; David B. Skillicorn (eds.). Open Source Intelligence and Cyber Crime: Social Media Analytics. Lecture Notes in Social Networks. Cham: Springer International Publishing. pp. 1–27. ISBN 9783030412517.
^ Yoder, Christian N.; Reid, Scott A. (2019-10-01). "The quality of online knowledge sharing signals general intelligence". Personality and Individual Differences. 148: 90–94. doi:10.1016/j.paid.2019.05.013. ISSN 0191-8869.
^ Singh, Harshdeep; West, Robert; Colavizza, Giovanni (2020-07-14). "Wikipedia Citations: A comprehensive dataset of citations with identifiers extracted from English Wikipedia". arXiv:2007.07022 [cs]. Dataset
^ Arroyo-Machado, Wenceslao; Torres-Salinas, Daniel; Herrera-Viedma, Enrique; Romero-Frías, Esteban (2020-02-10). "Science through Wikipedia: A novel representation of open knowledge through co-citation networks". PLOS ONE. 15 (2): –0228713. doi:10.1371/journal.pone.0228713. ISSN 1932-6203.
^ Piccardi, Tiziano; Redi, Miriam; Colavizza, Giovanni; West, Robert (2020-04-20). "Quantifying Engagement with Citations on Wikipedia". Proceedings of The Web Conference 2020. WWW '20. New York, NY, USA: Association for Computing Machinery. pp. 2365–2376. doi:10.1145/3366423.3380300. ISBN 9781450370233. Author's copy
^ Pinto, Luiz F.; Santos, Carlos Denner dos; Onoyama, Silvia (2020-07-14). "Individual Factors that Influence Effort and Contributions on Wikipedia". arXiv:2007.07333 [cs].
^ Sethuraman, Manasvini; Grinter, Rebecca E.; Zegura, Ellen (2020-06-15). "Approaches to Understanding Indigenous Content Production on Wikipedia". Proceedings of the 3rd ACM SIGCAS Conference on Computing and Sustainable Societies. COMPASS '20. Ecuador: Association for Computing Machinery. pp. 327–328. doi:10.1145/3378393.3402249. ISBN 9781450371292.
^ Bernard, Timothée (July 2020). "Tabouid: a Wikipedia-based word guessing game". Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Online: Association for Computational Linguistics. pp. 24–29. doi:10.18653/v1/2020.acl-demos.4.
^ Heindorf, Stefan (2019). Vandalism Detection in Crowdsourced Knowledge Bases (Thesis). Paderborn, Germany: Paderborn University. S2CID 209517598. (dissertation)
^ Gleim, Lars C.; Schimassek, Rafael; Hüser, Dominik; Peters, Maximilian; Krämer, Christoph; Cochez, Michael; Decker, Stefan (2020). "SchemaTree: Maximum-Likelihood Property Recommendation for Wikidata". In Andreas Harth; Sabrina Kirrane; Axel-Cyrille Ngonga Ngomo; Heiko Paulheim; Anisa Rula; Anna Lisa Gentile; Peter Haase; Michael Cochez (eds.). The Semantic Web. Lecture Notes in Computer Science. Cham: Springer International Publishing. pp. 179–195. doi:10.1007/978-3-030-49461-2_11. ISBN 9783030494612.
^ Hickman, Molly G.; Pasad, Viral; Sanghavi, Harsh; Thebault-Spieker, Jacob; Lee, Sang Won (2020-06-17). "Wiki HUEs: Understanding Wikipedia practices through Hindi, Urdu, and English takes on evolving regional conflict". Proceedings of the 2020 International Conference on Information and Communication Technologies and Development. ICTD2020. Guayaquil, Ecuador: Association for Computing Machinery. pp. 1–5. doi:10.1145/3392561.3397586. ISBN 9781450387620.

Reader comments

2020-08-30

A slow couple of months

Contribute —

By Bri

Arbitration requests

Amendment requests

Amendment requests adjusting one editor's editing restrictions are not discussed here.

Pseudoscience – outcome pending as of publishing deadline

Arbcom member DGG included this statement in his decision: [This case has] helped me settle my position on the more general question of DS (discretionary sanctions): I would abolish them, and then there would be no more such questions. among other merits of terminating the procedure, is that it leads to inappropriate requests for us to involve ourself in deciding content. What is within the scope of arb com is to end the concept of DS, and the only reason I do not now propose it by motion is that I do not think it would have a majority yet.

Palestine-Israel articles – outcome: new editors may not request a page move
Brahma Kumaris – outcome: article probation terminated
Genetically modified organisms – outcome: remedy 2 changed:

2) Editors are prohibited from making more than one revert per page per day on any page relating to genetically modified organisms, ~~agricultural biotechnology, and agricultural chemicals,~~ commercially produced agricultural chemicals and the companies that produce them, broadly construed and subject to the usual exemptions.

Declined/withdrawn

Case request by Danielklein declined 13 July
Case request by AranyaPathak declined 17 August
Case request by Jenhawk777 withdrawn 21 August

Unban

Lightbreather unban unsuccessfully appealed

As part of Wikipedia:Arbitration/Requests/Case/Lightbreather, Lightbreather (talk · contribs) was site banned and subject to several restrictions. Following an appeal to ArbCom by email, a motion to unblock Lightbreather and lift the restrictions was posted for discussion on-wiki. The request was closed 18 July after Arbcom decided not to reverse the ban.

Other matters

Arbitration Committee noticeboard: changes to functionary team

Reader comments

2020-08-30

Wikipedia for promotional purposes?

Contribute —

By Ral315

This article was first published 15 years ago on August 22, 2005, eight months after The Signpost was founded. It may be the first Signpost article about paid editing, but certainly hasn't been the last. An earlier article, Outside groups targeting Wikipedia spur fears about bias, published February 7, 2005, a month after The Signpost's first issue raised similar questions about conflict-of-interest editing and canvassing.–S

Twice recently, television organizations have been accused of attempting to use Wikipedia for promotional purposes. The BBC recently added articles on Jamie Kane and Boy*d Upp, a fictional character and band existing in a BBC alternate-reality game. In another incident, G4's Attack of the Show program, to commemorate an appearance by Jimbo Wales, created User:Attackoftheshow, a user page which was used primarily as a sandbox for interested viewers to edit, raising questions over whether the usage was permissable or not.

Jamie Kane

On August 12, a new user created an article about Jamie Kane, asserting that the fictional star of a boy band was real. The article was quickly tagged for speedy deletion, then taken to VfD. Uncle G and other editors changed the article, expanding it and making note that the band was fictional. The VfD subsequently failed, though a series of unsigned and unregistered users attempted to vote.

Later, an article on the fictional band, Boy*d Upp, was created by an IP address inside the BBC, assumed to be a BBC employee. This article was also tagged for VfD, and was deleted, then redirected to Jamie Kane. BBC confirmed that an employee had written the article, but denied that it was meant to promote the game:

"The first posting was simply a case of a fan of the game getting into the spirit of alternative reality a little too much. The follow up posting was made by a fan of the game who happens to work in the BBC (where we've been beta-testing for the last month). This was unauthorized and made without the knowledge of anyone in the Jamie Kane Team or BBC Marketing. To confirm: the BBC would never use Wikipedia as a marketing tool."

Attack of the Show

On August 16, G4 aired an interview with Wikipedia founder Jimbo Wales. They created a user page for the show, where viewers could edit as they pleased. Vandalism ensued, and just a day after the episode aired, and over 1200 edits after the page was created, the page was protected. As of press time, the page is still protected to deal with vandalism.

Tony Sidaway protected the page immediately after it was created, but Jimbo unprotected it and instructed administrators to leave it open, because he had already talked with G4, and authorized the move.

Issues with using Wikipedia for marketing

From Wikipedia's point of view:

if it successfully draws people's attention to the product, then it's highly likely that editors will notice it; once the editors get there they can begin to deal with it
if the article is accurate, then it's possibly a legitimate article
if it's not wiki-worthy, then the editing process will make it so, or delete it

From the marketers point of view the Wikipedia is a difficult choice:

if the article is biased, then the Wikipedia's editors will balance it (it seems reasonable not to expect the marketers to much enjoy that balancing)
in any case, once they've placed it in Wikipedia, the marketers will have lost control of it, and from their point of view it is totally a loose cannon. Again, they probably won't like that much.

Possibility of marketing spam in the future?

This raises the legitimate question of whether marketing spam may be a problem in the future. While this is a common occurrence on Special:Newpages patrol, a more confusing type of spamming such as the Jamie Kane articles may occur, where many users may be confused over whether the article's content is real, fake, or even vanity. Perhaps what is most reassuring is that all three pages were quickly found and taken care of. Nevertheless, this is a problem that may occur again in the near future.

Reader comments

2020-08-30

Marcus Sherman, Jerome West, and Pauline van Till

Contribute —

By Wikipedia editors

Marcus Sherman (Marcus334)

Marcus Sherman (August 5, 1947 – April 25, 2020) from Cape Cod joined Wikipedia on 14 January 2007 and was keenly interested in improving content related to the protected areas in southern India on the English Wikipedia.^[1]^[2]^[3]

References

Jerome West (Jcw69)

Jerome died on 19 July 2020. He was a South African contributor and administrator on the English Wikipedia. He made 9,265 edits. Jerome's death, from consequences of COVID-19, was announced by his widow on Facebook.

Pauline van Till (Pvt pauline)

Last month, the Dutch Wikipedia lost a long-time member of the editing community, Pauline van Till. She volunteered at the Museum Sophiahof which used her title "Barones" in its obituary.^[1] The Dutch wiki Wikisage reports that she was the first female caddy on the PGA European Tour, where she got the nickname "Dutchess".

Van Till wrote articles in the areas of golf and the international world of golfers in the Dutch and English Wikipedias, and also contributed images from all over the world to Commons. One of her best-known pictures, widely used through the projects, is of Johan Cruijff as a golfer in 2009: File:Johan Cruijff golfer cropped.jpg. She was also known under her other accounts, Pvt pauline~commonswiki and Pvt pauline~enwiki.

References