The Swiss Wikimedia chapter was founded on May 14, 2006, almost exactly ten years ago. It counts about 250 paid-up members and is one of only two chapters allowed to process income from fundraising banners directly. Recent discussions on the French Wikipedia have drawn attention to the involvement of some of the chapters' current board members in a paid-editing firm. The Signpost investigated this issue.
On the French Wikipedia, discussions began on April 6, 2016 on the paid-editing activities of Swiss firm Racosch Sàrl, whose website states:
Wikipedia by Wikipedians
Racosch is a Swiss boutique consulting firm specialised in editing Wikipedia articles.
Our clients are companies as much as high-profile individuals, as well as other Public Relations specialists who want to update or add factual information, correct inaccuracies or address the presence of unsightly banners at the top of articles.
In the course of the discussions, outgoing Wikimedia Switzerland (WMCH) board member Gabriel Thullen (GastelEtzwane) wrote that it is common knowledge – at least within WMCH – that two of the company's principals have been long-standing board members of the chapter, while a third is married to a WMCH employee. The company's three principals are listed on Swiss company registration websites as Stéphane Coillet-Matillon, Frédéric Schütz, and Nicolas Ray. Coillet-Matillon (Wikipedia user Popo le Chien) and Schütz (Wikipedia user Schutz) are current WMCH board members; Schütz is the chapter's vice-president and French-speaking press contact on the WMCH website.
The involvement of chapter board members in paid PR work has previously led to significant adverse publicity, as evidenced by the 2012 Gibraltarpedia controversy. We contacted WMCH requesting further information and received prompt replies from Frédéric Schütz.
Our questions and Schütz's answers are below.
I am personally involved. Stéphane is also involved – but he did not stand for reelection at the recent general assembly and his term ends June 1st. The third associate is the husband of WMCH's administrative assistant. No WM CH staff is involved.
FR and EN at the moment.
Not on-wiki. More specifically: the name "Racosch" is never associated with the name WMCH, to avoid giving the wrong impression that Racosch is in any way endorsed by the Chapter.
But this is being discussed openly, e.g. within the Swiss community (see below). Stéphane recently attended the Berlin WM conference and was also very transparent about it; he will likewise attend Wikimania and we're discussing making a Beutler/Lih type of presentation at the upcoming French Wikicon.
The chapter has a policy on conflicts of interest, which requires disclosing all potential interests in writing – which was done.
In case of a request to Wikimedia CH, the policy is to reply that the chapter cannot provide advice on this topic and in particular cannot recommend anyone. This being said, one of us remembers that during past discussions someone had informally mentioned Beutler Ink, which was the only one we knew of that does proper paid editing.
Note that in any case such contacts are handled by our 3 community liaisons, not by board members (nor by the administrative assistant indicated above).
The paid editing matter was spontaneously disclosed by both Stéphane and I while introducing ourselves, and was of course discussed during the general assembly (which typically attracts around 30+ participants). In the end, Stéphane did not recandidate (but he would likely have had no problem being reelected), while I received 27 votes/32 (second best score) – indicating that we approached the matter rather correctly.
We'll likely make it publicly available, yes. In the meantime, see attached a PDF version of the version currently available on our members wiki.
The general assembly minutes the Signpost received from Schütz contain two references to paid editing:
The 10 candidates introduce themselves. Stéphane Coillet-Matillon announces that he retracts his candidature as a member as he wants to concentrate on his new company.
The assembly asks questions to the candidates, in particular about potential conflicts of interest and paid editing.
A member suggests that the association should revise its bylaws and discuss the topic of paid editing; this is not discussed further, due to lack of time. Nevertheless, the new board will take this topic into consideration.
The WMCH conflict-of-interest policy Schütz refers to states, in part,
Since conflicts of interest cannot be avoided, they should be handled professionally. ...
- Each member of the Board or of the Executive Management team should arrange his personal and business affairs so as to avoid, as far as possible, conflicts of interest with the association.
- Should a conflict of interest arise, the member of the Board or Executive Management concerned should inform the President of the Board. The President, or Vice-President, should request a decision by the Board which reflects the seriousness of the conflict of interest. The Board shall decide without participation of the person concerned, and the conflict of interest and the board decision will be recorded in the minutes.
- ... Anyone having a permanent conflict of interest should not be a member of the Board or the Executive Management.
On the English Wikipedia, three user accounts presently mention an association with Racosch on their user pages, along with the articles they have made paid contributions to:
All three are also active under the same names on the French Wikipedia, where similar disclosures are made. Schutz's user page on the French Wikipedia has declared Wicodric as a secondary account for paid contributions since April 8, 2016.
The Signpost looks forward to further community discussion, and thanks Frédéric Schütz for his candid and timely replies to our questions.
Ed had worked in radio, first as a disc jockey and later with broadcast automation systems. He co-founded FenCon (a literary science-fiction event) and WhoFest (a convention dedicated to the iconic BBC series Doctor Who), and was well-known in the science fiction and fantasy communities. He was an Eagle Scout and a graduate of the United States Space Camp. He was born in Huntsville, Alabama, and at the time of his death, he lived in Dallas. His full obituary is here.
In a follow-up to his story on Wikipedia Zero-based piracy in Angola (see previous Signpost coverage), Motherboard's Jason Koebler reports (April 27) on very similar problems with piracy in Bangladesh, arguing that "Wikipedia's piracy police are ruining the developing world's Internet experience":
Wikipedia Zero users in Bangladesh are now being monitored, banned, and threatened by Wikipedia editors who are engaged in a continuous game of whack-a-mole against piracy on the site.
Last month, I wrote several articles about the creative (if illegal) ways that people in Angola are using the free Wikipedia Zero and Facebook Free Basics services to share copyrighted files with each other. Both of these services zero rate data uploaded and downloaded from those sites, meaning users don’t have to pay for that data, which would normally be very expensive. Users upload files to the Wikimedia Commons database, link to them in closed Facebook groups, and, bam – free ad-hoc filesharing network.
Koebler says the "arms race" between the pirates and Wikimedians trying to stop them is "significantly more advanced" than it it is in Angola:
A task force of editors in the developed world are desperately trying to get Bangladeshis play by Wikipedia's existing rules by closely monitoring and banning people who upload pirated content. They're invading Facebook groups to monitor and determine how and where people are uploading files. They're keeping a running tally of the number and names of accounts that have uploaded content. They've blocked entire IP ranges from uploading files, and have created filters that monitor all uploads that come from Wikipedia Zero accounts and from new accounts in general.
Meanwhile,
the Bangladeshi operations that I've seen appear to be much more sophisticated than the Angolan ones – they have posted specific guides to converting videos to smaller and harder-to-detect file types, have started using Wikipedia test sites, and have started using free sites online that automatically upload YouTube videos to Wikimedia Commons.
Wikimedia Bangladesh has become involved, pleading with users to stop the uploads, telling them they are contributing to an "increasingly negative perception of Bangladesh in many different sectors" by treating Wikimedia sites as a sort of free YouTube. But, Koebler argues,
Commons is YouTube for Wikipedia Zero users out of necessity, not choice. Because they can't afford access to YouTube and the rest of the internet, Wikipedia has become the internet for lots of Bangladeshis. What's crazy, then, is that a bunch of more-or-less random editors who happen to want to be the piracy police are dictating the means of access for an entire population of people ... there's no simple way out of this situation. When you create two entirely different tiers of internet, those in the second tier will rightly aspire to get into the first tier.
Gizmodo reports (April 25) on a new study by Bradi Heaberlin and Simon DeDeo arguing that Wikipedia has become a corporate bureaucracy, "akin to bureaucratic systems that predate the information age."
Wikipedia is a voluntary organization dedicated to the noble goal of decentralized knowledge creation. But as the community has evolved over time, it has wandered further and further from its early egalitarian ideals, according to a new paper published in the journal Future Internet. In fact, such systems usually end up looking a lot like 20th century bureaucracies.
Even in the brave new world of online communities, the Who had it right: "Meet the new boss, same as the old boss."
One of the study's most striking findings, Gizmodo reports, is that
even on Wikipedia, the so-called "Iron Law of Oligarchy" – a.k.a. rule by an elite few – holds sway. ... "You start with a decentralized democratic system, but over time you get the emergence of a leadership class with privileged access to information and social networks," DeDeo explained. "Their interests begin to diverge from the rest of the group. They no longer have the same needs and goals. So not only do they come to gain the most power within the system, but they may use it in ways that conflict with the needs of everybody else."
DeDeo and Heaberlin note Wikipedia's conservative nature: over 89 per cent of its core norms, created by a small pool of around 100 users, have remained unchanged; they have achieved a "myth-like status" even as they inevitably conflict with each other. Resolution of such conflicts is made more difficult by the fact that editors form central "neighbourhoods" organised around "article quality, content policy, collaboration, and administrators" that are "increasingly separate and interact with each other less and less", leading to the emergence of tribalism.
DeDeo and Heaberlin performed a purely mathematical analysis of broad trends in the Wikipedia data, connecting this hyper-quantitative approach with sociology and political science. The next step is to collaborate with cultural anthropologists to undertake a close reading of all those inter-linked individual pages.
"We need to understand how these systems work if we're going to understand how the economy of the future will run. They don't have laws, they have traditions and norms," said DeDeo when asked why this kind of research matters. "I think what we're doing is investing research into a problem that, 200 years from now, could be the biggest problem in the world – if we don't destroy ourselves first."
In its article, Gizmodo references a study published earlier this year in Physical Review E by Jinhyuk Yun (윤진혁), Sang Hoon Lee (이상훈), and Hawoong Jeong (정하웅) from the Korea Advanced Institute of Science and Technology, which came to similar conclusions about Wikipedia. The Korean study received a German-language write-up in taz this week (April 28).
DeDeo's and Heaberlin's study was subsequently also covered by The Washington Post as well as by Sciencealert.com (April 28).
The Washington Post, along with many other media outlets, reports that according to a new study by Jon Penney, "Snowden's disclosures about NSA spying had a scary effect on free speech":
Internet traffic to Wikipedia pages summarizing knowledge about terror groups and their tools plunged nearly 30 percent after revelations of widespread Web monitoring by the U.S. National Security Agency, suggesting that concerns about government snooping are hurting the ordinary pursuit of information.
The study, titled "Chilling Effects: Online Surveillance and Wikipedia Use", is
focused on Wikipedia pages related to sensitive topics specifically flagged by the Department of Homeland Security. In a document provided to its analysts in 2011, the DHS listed 48 terrorism terms that they should use when "monitoring social media sites." Penney collected traffic data on the English Wikipedia pages most closely related to those terms.
The collected data showed that pageviews dropped immediately after the June 2013 news stories about Snowden and never recovered to previous levels.
"You want to have informed citizens," Penney said. "If people are spooked or deterred from learning about important policy matters like terrorism and national security, this is a real threat to proper democratic debate."
The New Statesman covers (Apr. 17) a project kickstarted by Bee Wilson, chair of the Oxford Symposium on Food and Cookery, to bring more women editors to Wikipedia in order to improve its articles on food. The article's writer, Felicity Cloake, visited a related group editing session at the British Library.
[Wikipedia's] "egregious gender imbalance" is especially notable in matters relating to food, because, as Polly Russell, the library's curator of food studies, explains, "we're such a new area of serious study". Most food throughout history has been cooked by women, "but if you can’t name them, they get forgotten."
Commenting on the under-representation of notable women on Wikipedia,
Wilson ... cites the example of Philippa Glanville, a former chief curator of the metalwork, silver and jewellery department at the Victoria and Albert Museum and a world expert on historical dining practices, whose achievements were recognised by the Queen before the online encyclopaedia ("Presumably getting on Wikipedia should be easier than getting an OBE").
Facilitating this process is the goal of Wiki-Food, which groups academics, students, experts and enthusiastic amateurs with the aim of improving and expanding Wikipedia's coverage of food-related topics, especially but not exclusively those relevant to women, with support from Wikipedia.
VentureBeat reports (Apr. 28) on a collaboration between Wikimedia and Stanford University to help point translators to significant content gaps in other language versions of Wikipedia:
finding out which topics or articles are in particular shortage in specific tongues is a challenge, which is why Wikimedia is partnering with Stanford University researchers to design a new recommendation system. This will rank Wikipedia articles in order of priority across languages. The ranking is based on a number of factors, including editor interests (using contribution history data), language proficiency, and anticipated popularity if an article was translated. For example, a native Swahili speaker is unlikely to care about the history of a U.K. baking business, but they may care about WrestleMania 32.
University news site Futurity also has an article (Apr. 15) on the project; a Wikimedia blog post (Apr. 27) is available here.
The unexpected death of Prince on April 20 leads the chart with the highest view count in this chart's history, breaking the record that was just set this January by the passing of David Bowie. Outside the Top 10, six additional slots are taken up by Prince-related topics, but death dominated the Top 10 generally, with wrestler Chyna at #2, British comedian Victoria Wood at #7, American actress Doris Roberts at #8, and the ever popular Deaths in 2016 rising to #5 this week.
For the full top-25 lists (and our archives back to January 2013), see WP:TOP25. See this section for an explanation of any exclusions. For a list of the most edited articles every week, see here.
For the week of April 17 to 23, 2016, the ten most popular articles on Wikipedia, as determined from the report of the most viewed pages (WP:5000), were:
Rank | Article | Class | Views | Image | Notes |
---|---|---|---|---|---|
1 | Prince (musician) | 13,064,933 | Well, although the rational side of our human brains know that the recent spate of deaths of musical icons in 2016 must be coincidental, it sure doesn't feel that way to the emotional side of our brain looking to make sense of things. The news of the completely unexpected death of Prince at age 57, a highly successful artist who first became famous in the 1980s, and whose talent was quite widely acknowledged, spread throughout the internet like wildfire. It was only in January of this year that the death of David Bowie yielded the first entry in this chart's history to hit eight figures – with 11.7 million views. Prince's death has now exceeded that record, racking up over 13 million views – and in only three days, as he died on Thursday. Bowie died on a Sunday, so his 11.7 million views were obtained over a full seven days. Does this mean that Prince was more beloved than Bowie? How does one judge? | ||
2 | Chyna | 2,121,679 | The lead sentence of our article on Chyna says she "was an American professional wrestler, actress, glamour model, bodybuilder, English teacher and pornographic film actress." She rose to fame on the wrestling part, though. She was found dead in her California home on April 20, at the age of 46. | ||
3 | Harriet Tubman | 1,358,526 | Last week it was announced that one of the most famous women in American history would be replacing President Andrew Jackson (#16) on the United States twenty-dollar bill. The new bill is expected to be unveiled in 2020. When the idea of putting a woman on a U.S. bill first arose last year, it was floated that the target was the ten-dollar bill, which features Alexander Hamilton. However, in one of those odd turns of history that will certainly generate many Reddit "Today I Learned" threads in the future, the success of the musical Hamilton was credited for the change in plans. | ||
4 | 420 (cannabis culture) | 1,021,596 | This curious "holiday", which falls on April 20 (for obvious reasons) refers to the mysterious number 420 and its long association with marijuana usage. While it may not quite be to cannabis what Oktoberfest is to beer, it no doubt aspires to be. And it returns to the top 5 as it has in previous years. We also note the article remains, every year, far too laid back to improve any further from Start Class. | ||
5 | Deaths in 2016 | 953,110 | A big jump this week due to #1. And with Prince's death only the latest in a streak of high profile celebrity deaths, we are now seeing many articles asking "why" there have been so many celebrity deaths in 2016. Setting aside the coincidental spikes that can always occur, the most likely answer comes from BBC obituary editor Nick Serpell. He argues that there are more famous people now, starting from the 1960s, and these people are now in the 60s and 70s and naturally starting to die. If we extrapolate from that, you could argue that social media has boosted the number of famous people once again in the past ten years. Does that mean that in 50 years this chart will be inundated with the deaths of people like the Numa Numa guy, David After Dentist and Damn Daniel? Stay tuned to find out. | ||
6 | The Jungle Book (2016 film) | 939,876 | Down from #2, but only 100,000 views down from last week. This American film based on Rudyard Kipling's The Jungle Book, previously adapted to screen in a 1967 animated film, had its world premiere on April 4. It was released in 15 countries on April 8, and debuted in the US on April 15 to a stellar $103 million weekend and rapturous reviews (the film currently has a 94% RT rating). Despite being described as a "live-action reboot", the film is really more of a CGI cartoon, with nearly everything onscreen except the lead child actor Neel Sethi composed of computer graphics. | ||
7 | Victoria Wood | 927,781 | This English commedienne and five-time BAFTA-winning actress died of cancer on April 20, 2016. Much of her humour was grounded in everyday life, and included references to popular British media and brand names of quintessentially British products, which made her fame relatively exclusive to Britain. And while I am embarrassed to admit it, this made me google whether Morrissey liked her. And indeed he did, along with many many others. | ||
8 | Doris Roberts | 897,883 | This American actress died on April 17, best known for her role playing Marie Barone in the American sitcom Everyone Loves Raymond. (Morrissey had nothing to say about her death, though she received many fine tributes from others.) | ||
9 | Fan (film) | 892,651 | On for another week, with a jump of over 150,000 views over last week. This Bollywood hybrid of The Fan and Single White Female, in which a Bollywood star and an obsessed lookalike (both played by Shah Rukh Khan (pictured)) gradually become entangled in a game of revenge, was made on a relatively hefty budget of ₹850 million ($13 million). It has now earned more than ₹1.72 billion ($26 million). | ||
10 | William Shakespeare | 881,813 | Yes, it is yet ANOTHER celebrity death. Zounds! However, this one occurred four hundred years ago this week, and was celebrated by a Google Doodle, among many other mentions in the press. |
Just missing the WP:TOP25: Apollonia Kotero (#26, Prince-related); Prince albums discography (#27); Vanity (singer) (#28, Prince-related); List of The Flash (2014 TV series) episodes (#29); List of Bollywood films of 2016 (#30)
Seven featured articles were promoted these weeks.
Six featured lists were promoted these weeks.
Four featured pictures were promoted these weeks.
On 19 April, the arbitration committee unbanned two editors, Ottava Rima and Prof. Carl Hewitt. Both remain subject to various editing restrictions, but are permitted to contribute within certain parameters. Welcome back.
The evidence submission phase in the Wikicology case ended on 25 April. The case has now moved to the Workshop phase, which is due to close on 2 May. Proposals made to date range from a site ban or an indefinite ban from article space, which would allow Wikicology to continue contributing to draft space, user space and talk pages, to a topic ban from biomedical and public health and policy topics.
The Gamaliel and others case is in its evidence phase, which is due to end 6 May. Developments to date include a temporary injunction prohibiting DHeyward and Gamaliel from interacting, passed on 19 April 2016, and the addition of DHeyward, Arkon and JzG as involved parties, which proved controversial on the case's talk page.
Independently of the ongoing case, on 30 April the committee made an announcement on the ArbCom noticeboard indefinitely restricting Gamaliel, "per his request", "from taking any action to enforce any arbitration decision within the GamerGate topic, broadly construed".
In an amendment to the Infoboxes arbitration case announced on 21 April, the arbitration committee rescinded three remedies applied to Pigsonthewing, who is "cautioned that the topic of infoboxes remains contentious under some circumstances and that he should edit carefully in this area."
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
Who did what: editor role identification in Wikipedia is the title of an upcoming paper to be presented at the International Conference on Web and Social Media (ICWSM) in Cologne, Germany.[1] The work presented in the paper is a collaboration between researchers from Carnegie Mellon University and the Wikimedia Foundation. The authors' goal was to analyze edits from the English Wikipedia to identify roles played by editors and to examine how those roles affected the quality of articles.
Identifying roles of participants in online communities helps researchers and practitioners better understand the social dynamics that lead to healthy, thriving communities. This line of research started in the 2000s, focused on Usenet groups, before expanding to wiki communities like Wikipedia.[supp 1]
The paper covers the three stages of work:
For the first stage, the authors built on previous publications that aimed at classifying Wikipedia edits, in particular the work by Daxenberger et al.[supp 2] Classifying edits usually starts by separating them by namespace. A more granular approach considers not just the namespace, but the content of the change. This was the method chosen here for edits in the main namespace, with the possibility of assigning a revision to multiple categories: for example, a single revision can entail both "grammar" and "template insertion" changes. Those categories were operationalized using an ensemble method classifier based on the content and metadata of the edit.
Then, the authors derived roles based on patterns that emerged from the classes of edits, using the latent Dirichlet allocation method (LDA). This method is traditionally used in natural language processing to identify topics making up a document. Here, the authors used the method to identify roles making up a user, positing that a user is a mixture of roles in the same way that a document is a mixture of topics. In addition to edits, they trained the LDA model using other information such as reverts, and edits in other namespaces.
They ended up with eight roles: social networker, fact checker, substantive expert, copy editor, wiki gnomes, vandal fighter, fact updater, and Wikipedian. They found that most editors play between one and three of those roles. To validate the roles, they attempted to predict edit categories based on the editors' roles, with mixed results.
Last, the authors examined whether the roles of editors were correlated with the evolution of the quality of a set of articles. They measured article quality twice, six months apart, using an existing model[supp 3] that assigns a score in Wikipedia's qualitative assessment scale based on the article's measurable characteristics.
They found some correlation between the difference in quality and the roles involved, taking into account control variables like the starting quality score. Their results suggest that "the activities of different types of editors are needed at different stages of article development". For example, "as articles increase in quality, the substantive content added by substantive experts is needed less" but "the cleanup activities by Wiki Gnomes become more important".
One limitation acknowledged by the authors is that their detailed edit classification was only performed on edits made in the main namespace (Wikipedia articles). For other edits, they only considered the namespace itself. Namespaces like Wikipedia:
are host to very varied activities, and applying the same level of detail to them would presumably yield a richer, and possibly more accurate, taxonomy of roles.
Some choices in the role nomenclature are a little surprising. For example, it seems odd to have one role simply called "Wikipedians", or "reference modification" being a behavior representative of "social networkers". Translating patterns of data (structural signatures) into words (roles) is a difficult exercise, and often a weak link in such analyses.
In conclusion, the article is a welcome contribution to the field of Wikipedia research, in particular of editor roles on Wikipedia. Many previous role identification efforts have used a simplified approach where editors were reduced to their main role. In contrast, here the authors went further and considered editors as a mixture of roles, which is expected to provide a more accurate representation of human behavior.
Since the authors mention task recommendation as a possible application of their work, it would be particularly interesting to examine how the role composition of a user evolves over time. There may be patterns in the evolution of users' roles during their life cycle as editors. Uncovering such patterns could lead to more relevant task recommendations, and help guide editors along their contribution journey.
This paper was published in the Information Sciences journal and was co-authored by researchers from several Polish universities.[2] The paper's central research question is "are the popular assumptions about the social interpretations of networks created from the edit history valid?" The paper evaluates four different methods for constructing complex networks from Wikipedia data and comparing these constructs with survey results about Polish Wikipedians' self-reported relationships. While there is a strong correspondence between all the different network types, networks derived from Wikipedians' talk pages map most clearly onto Wikipedians' feelings of acquaintanceship.
The paper examines four kinds of relationships: co-edits to article and user talk pages (acquaintanceship), co-edits in the vicinity of other users' text (trust), reverts of editors' revisions (conflict), and co-edits to articles in the same category (shared interest). Crucially, the paper extends prior research using these network constructs by conducting a respondent-driven survey of Wikipedians to ask them to name other Wikipedians they consider to be acquaintances, trusted, conflict-prone, or having the same interest. The survey respondents tended to be more experienced than typical users and so responses were re-weighted based on population frequency.
The paper goes on to use a variety of machine learning methods to evaluate the strength of the relationship between different behavioral features and the self-reported relationships. First the find that naive constructions of these networks from behavioral data only end up predicting one kind of relationship (discussion/acquaintanceship). Using more complex sets of temporal features such as days since last edit and category similarity to account for biases in self-reporting yielding only marginal improvements in model performance. The authors conclude by suggesting that the correspondence between relationships imputed from observed Wikipedia data and the relationships reported by Wikipedians themselves are weak.
The survey methods employed in this paper to generate the ground-truth networks can be criticized by the lack of randomness in the population or generalizability across other wiki communities. Similarly, there are well-known limits on informant accuracy compounded by the often impersonal nature of the editing interface and process. Nevertheless, this research suggests that researchers combining behavioral data social network methods may be making faulty assumptions about how strong the observed relationships are actually perceived by the Wikipedians themselves.
This study[3] from researchers at the University of Helsinki examines cross-correlations between Wikipedia pageviews, news media mentions, and company stock prices. This work extends prior work that developed a trading strategy based on Wikipedia pageviews to assess stock market moves[4][5] by extracting entities about companies, products, and dates from news media mentions and matching them to Wikipedia entries. An exploratory case study demonstrates there are some correlations across these three indices and that the strongest cross-correlations are observed without a time lag and for the same company. However, in a subsequent case study involving 11 large companies, the strongest cross-correlations were for The Home Depot and Netflix. That correlations among news mentions, Wikipedia pageviews, and stock performance is neither theoretically nor empirically surprising, but the paper's work on identifying entities and mapping them to Wikipedia articles could have some potential. Research like this comparing correlations across dozens of entities and time series is subject to multiple comparisons problems and there's likewise a large body of methods in mathematical finance that can be used to extend these findings further.
A calendar of events (mostly research conferences) relevant to Wikimedia-related research has recently been set up on Meta-wiki. Notable entries for this month include CHI 2016 and ICWSM-16.
This conference paper[6] presents a method to automatically detect promotional content in Wikipedia. It appears to aim at articles, but the actual method focuses on user pages.
The authors highlight the fact that their method is purely text-based, whereas "[c]urrently most researches about spamming in Wikipedia are focusing on editing behavior and making use of user’s edit history to do feature-based judging." (See, however, our earlier coverage of a related paper that reported success using stylometric, i.e. text-based features: "Legendary, acclaimed, world-class text analysis method finds you promotional Wikipedia articles really easily")
The researchers explain that a "traditional bag-of-words document vector representation" (counting only word frequencies) is insufficient. Instead, they "employ a deep learning method to obtain a word vector for each word and then apply a sliding window on each document to gradually gain the document vector." The classifier was trained on a dataset of user pages speedily deleted under criterion "G11. Unambiguous advertising or promotion", compared to user pages of administrators which were assumed to be advertising-free. In tests (which apart from Wikipedia user pages also included a dataset of web page ads drawn from other sites) it "produced better performance than the bag-of-words model in both precision and recall measurements."
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.
{{cite journal}}
: |issue=
has extra text (help); |volume=
has extra text (help)