The US National Archives and Record Administration (NARA) has committed to engaging with Wikimedia projects in their newest Open Government Plan. The biannual effort is a roadmap for how the agency will accomplish its goals in the digital age. In the first plan, issued in 2010, Archivist of the United States David Ferriero wrote "the cornerstone of the work that we do every day is the belief that citizens have the right to see, examine, and learn from the records that document the actions of their Government. But in this digital age, we have the opportunity to work and communicate more efficiently, effectively, and in completely new ways."
These "new ways" included reaching out to Wikipedia, starting in 2011 with the hiring of Dominic McDevitt-Parks as a Wikipedian in residence. The position began as a student internship, but McDevitt-Parks has since moved to being a digital content specialist with a specialty in the Wikimedia sites. Ferriero has spoken at multiple Wikimedia events, including the Wikipedia in Higher Education summit in 2011 (see Signpost coverage) and Wikimania 2012 (video; transcript; Signpost coverage). He has been frequently quoted saying varying forms of "if Wikipedia is good enough for the Archivist of the United States, maybe it should be good enough for you."
How has the Wikimedia movement benefited from NARA and McDevitt-Parks' placement? There are three organized projects dedicated to NARA. On Wikisource, NARA has an ongoing initiative that is transcribing US government documents. On Commons, NARA has uploaded over 100,000 images, the most recent of which came a month ago. The English Wikipedia has gone into action with several articles related to images from NARA, such as Desegregation in the United States Marine Corps. The site has benefited with several images uploaded for specific users, such as living Medal of Honor recipients, like Charles H. Coolidge, and the lead images for three US battleship articles: Pennsylvania-class battleship, USS Arizona (BB-39), and South Carolina-class battleship (Editor's note: the author of this article has made significant contributions to the last three pages).
All of that is in the past, though. The Open Government Plan lays out what NARA wants to accomplish in the next two years; but as a general plan it suffers from a lack of specifics. The Signpost contacted McDevitt-Parks to learn what the inclusion of Wikipedia in this plan will mean for the site.
He told us that there is no quantitative target for a total number of image uploads, because NARA plans to upload all of its holdings to Commons. "The records we have uploaded so far contain some of the most high-value holdings (e.g. Ansel Adams, Mathew Brady, war posters)", he said. "However, we are not limiting ourselves to particular collections. Our approach has always been simply to upload as much as possible ... to make them as widely accessible to the public as possible."
To accomplish this, volunteers are working with NARA on a new upload script to port images to Commons; the work in progress is posted on Github. At NARA itself, an API is in development that will make it easier to extract the metadata of the images. Given these efforts, McDevitt-Parks says that they will "allow us to more easily upload all of our existing digitized holdings to Wikimedia Commons and similar third-party platforms, and also that in the future upload to platforms like Commons will be the end of all digitization. Looking at it this way, I would say that in a way all of our digitization efforts are also for upload to Wikimedia Commons."
In the meantime, the special requests process—the first pilot launched by NARA when McDevitt-Parks began his tenure—is still available for Wikipedia editors. In the future, they hope that this ad hoc arrangement can be supplemented with a volunteer citizen scanning program that will be able to "generate greater Wikipedian-initiated digitization."
The Vietnamese and Philippines-based Waray-Waray Wikipedias have crossed the one million article rubicon—the tenth and eleventh to do so. Just like the Swedish Wikipedia, the sites have attained this symbolic milestone with the help of bots, a process that has divided opinions among Wikimedians from several languages. For example, for a previous Signpost article on the topic, German Wikipedian Achim Raschka pointed us to an entry Denis Diderot wrote for the Encyclopédie, titled "Aguaxima". Diderot lamented that all they knew about the Aguaxima was that it was a plant in Brazil, yet he still had to describe it: "If all the same I mention this plant here, along with several others that are described just as poorly, then it is out of consideration for certain readers who prefer to find nothing in a dictionary article or even to find something stupid than to find no article at all."
In an email to the Wikimedia-l mailing list, Vietnamese Wikipedian Minh Nguyen wrote that some editors on the site shared similar concerns and were "alarmed" at the sharp uptick in bot-created articles. Yet at the same time, crossing the one million article mark with a high proportion of auto-articles led the community to look at its small size—its roughly 1250 active editors is less than the Catalan Wikipedia, a language with almost 60 million less speakers—and they are taking steps to ease the learning curves of new editors.
The question of active users is even more pertinent for fellow millionaire Waray-Waray, which has just 71 active users. The related Cebuano Wikipedia, which has also embraced bot-created articles and will soon join the million article club, has even fewer.
Meanwhile, the Swedish Wikipedia's article-creation bot has started editing again. The bot's operator told the Signpost that the source code has been rewritten to use the most recent references, though it is currently mostly operating on the Waray-Waray and Cebuano Wikipedias, which will soon also have one million articles. Other Wikipedias, such as Farsi (mostly spoken in Iran), have also expressed an interest in the bot's operation. Why have other Wikipedias not adopted similar processes, aside from those (like the English and German) that have philosophical objections? Lsj believes "it is mostly a matter of whether there is somebody who knows both bots and the target language well enough, and is prepared to devote the time required. Small language versions likely do not have such a person."
Despite the interest generated by its season finale, Game of Thrones still couldn't top the World Cup, which still dominated interest, as evidenced by the fact that this top 10 is virtually identical to last week's, just with a different dead celebrity.
For the full top 25 list, see WP:TOP25. See this section for an explanation for any exclusions.
As prepared by Serendipodous, for the week of 15–21 June, the ten most popular articles on Wikipedia, as determined from the report of the 5,000 most viewed pages, were:
Rank | Article | Class | Views | Image | Notes |
---|---|---|---|---|---|
1 | 2014 FIFA World Cup | 2,506,641 | While it is cold comfort to those (like me) whose home teams are already spanked and turfed, this exceptionally goal-heavy World Cup has produced some very entertaining football. Historically, this tournament has boiled down to a contest between Europe and South America, with each continent claiming 10 titles. Now, with England sent home, Spain crashing out and Italy and Portugal teetering, it seems the first South American World Cup in 36 years is favouring the home sides, with Chile, Colombia and Uruguay all storming through their first matches. But with Costa Rica coming out of nowhere to the shock and awe of everyone, could this be North America's turn? | ||
2 | FIFA World Cup | 1,337,592 | The broader article on the history of the competition may have been accessed by people looking for the long view, but in truth it was probably more to do with people looking for the more specific article above. | ||
3 | Amazon.com | 843,747 | This article suddenly reappeared in the top 25 a few months ago after a long absence; it's always difficult to determine the reasons for the popularity of website articles (how many are simply misaimed clicks on the Google search list?) but there are at least two possibilities: first, it released its digital media player, Amazon Fire TV on April 2, and second, it is currently embroiled in a dispute with publisher Hachette that could decide whether book publishers even need to exist in the post-digital world. | ||
4 | Game of Thrones | 770,438 | Well, that was the season finale. I half-expected it to beat the World Cup, but our users don't seem to have the crazed thirst for this show they displayed last year. | ||
5 | 2010 FIFA World Cup | 694,266 | The current World Cup has buoyed interest in the last one, with people doubtless looking for parallels, clues for upcoming matches, or omens. | ||
6 | Game of Thrones (season 4) | 577,004 | This is the page with the plot synopses for each episode. | ||
7 | Casey Kasem | 566,000 | The legendary radio personality died this week at the age of 82. There is no American of my generation or older who would not recognize Casey Kasem's voice. He hosted the national show American Top 40 for a total of 25 years, but will probably be best known outside the US as the English voice of Shaggy Rogers, the owner of Scooby-Doo – a role he played for nearly 40 years. (That isn't his normal voice though; if you want to hear what he usually sounded like, check out his voice cameo in Ghostbusters.) | ||
8 | List of Game of Thrones episodes | 493,961 | The episode list is probably used to look up air dates. | ||
9 | 2014 in film | 482,811 | A new entry for the list, probably in preparation for the Hollywood summer movie season. | ||
10 | 2014 FIFA World Cup squads | 441,589 | This is most likely the result of residents of competing countries checking out their opponents. |
In a Forbes video uploaded to YouTube three years ago entitled Why SugarCRM hired a CIO, Lila Tretikov was asked: "When you actually approached SugarCRM for the job, you weren't really looking for a CIO [chief information officer] title, were you?" She responded, "I don't think I wanted a title—I was curious about the job ... I am the kind of person who's led by curiosity, so I like jobs that will challenge me. So, title is not relevant to me; if I can solve a big problem, you can call me anything you want!"
Since May this year, Forbes has called Tretikov "executive director of the Wikimedia Foundation", naming her on its list of the world's 100 most powerful women. "Top of her agenda", says Forbes, "is to lead the community's struggle to increase diversity: 87% of Wikipedia contributors are men".
Last month, the Moscow-born technologist agreed to give her first interview as WMF executive director to the Signpost. The interview covers three key challenges for the movement: grantmaking, the global south, and gender. A second interview later this year will deal with engineering and products.
Tony1: Lila Tretikov, congratulations on your appointment!
Last year, your predecessor said that with such a high proportion of funding going to chapter staff and bricks and mortar offices, we need to ask whether the benefits are turning out to be worth the cost. Where do you stand on that?
To be more specific, you've recently stressed the fundamental importance of measuring impact on our end users, the readers. It's early days yet, but in terms of likely reader impact, are you keen to determine whether engineering our products deserves a higher proportion of donors' funds at the expense of grantmaking?
To go to specific grantmaking activities then, one person wrote that the Berlin Wikimedia conference hasn't resulted in a single long-term editor, and did nothing to create content or improve our infrastructure and software. You yourself said in Zurich that editathons are one of the more difficult and expensive things you can do in terms of attracting new editors. Should these types of outreach be considered a lower priority than they have been?
Of course, learning how to measure that across the cultural veil is a challenge, isn't it?
To turn now to the global south, the amount of global south funding is still running at only about 20% of Foundation grant money, this is for three-quarters of the world's population. So according to Asaf Bartov in the grantmaking department, it has actually been hard to find fundable projects that align with the Wikimedia global mission. Bartov has said that a key to success in global south programs is having a core of self-motivating active editors, even if it's only four or five people. he says we don't yet have an answer as to how you grow such a core, where it currently doesn't exist in the global south. What's your plan?
That's in numbers, of course, which includes a lot of travel grants and scholarships.
The foundation's core values concerning openness, transparency, and conflict of interest, are most familiar to progressive movements in the global north. Wikimedia Bangladesh has just made a big deal about how they achieved incorporation without paying the customary “speed money” to government officials. Do you favor a zero-tolerance policy towards practices that our core values might label as “corrupt” but in parts of the Global South are regarded as just the cost of doing business?
So let me get this right: the global south affiliate might pay small tips or bribes, or whatever we want to call them, to poorly paid civil servants to get things moving. Are these morally and—in terms of the attitude of WMF Legal—in the same category as the embezzlement scandals involving chapter board members in Kenya and Spain in 2012, or the fact that we still don't have the financial statement from Hong Kong's Wikimania last year, in which orders of magnitude more money is still at stake?
Okay so, it'll always be a bit of push and pull between an NGO's Global North headquarters and basic ethics, and its sprawling affiliates right around the world in very different social, political, and economic contexts.
In the time we still have, can we turn now to the gender gap then? What's your advice for a female editor, on say, the English Wikipedia, who feels uncomfortable even revealing her gender on wiki?
I've had contact with more than one female editor who has revealed her gender to me privately, only after some time, and utterly refuses to reveal it onwiki for a bunch of fears, whether well founded or not. Would you encourage that person to get to a point where she can reveal that she's a woman?
In Zurich, you said that the gender gap might be related to two hurdles, which I found very interesting. First, attracting women to make their first edit, and second, retaining them after they've made that first click.
Let's deal with them one at a time. How is your thinking evolving around what might encourage women to join the editing community in the first place. Specifically what kind of data do we need?
Are you saying there's a tipping point—there might be a tipping point in the future?
Let's talk about the community experience itself of being a female editor. You also said in Zurich, I quote, "Unfortunately the internet makes it really easy not to emphasize the person on the other side. You don't see their face, you don't hear their voice, and you don't feel like there's another human being there with thoughts and feeling and emotions." If we engineered easy ways for editors to interact more personally in real time on the sites—maybe through instant messages, even audio—would this create an environment that's less of a turn-off for women, or would it be seen by too many women as threatening?
Just a final question then. You've prospered in corporate IT and engineering, which is a professional world heavily dominated by male culture. If you could give women a few take-home messages now on how to overcome gender bias in the work place, what would they be?
So that's a well-honed view through many years dealing with mostly male cultures in the corporate sector.
Well, it was more a question. Did you start that way?
When you say mentors and friends, that stands out. We don't do that well on the Wikipedias, do we, mentoring and fostering specific friendships that are likely to, again, serve the readers best.
Lila Tretikov, thank you very much for taking the time to answer our questions.
This is mostly a list of non-article page requests for comment believed to be active on 26 June 2014 linked from subpages of Wikipedia:RfC, recent watchlist notices and SiteNotices. The last two are in bold. Items that are new to this report are in italics even if they are not new discussions. If an item can be listed under more than one category it is usually listed once only in this report. Clarifications and corrections are appreciated; please leave them in this article's comment box at the bottom of the page.
Ten featured articles were promoted this week.
Eleven featured pictures were promoted this week.
This week, the Signpost visited the land of Disney, blockbusters, explosions, dream sequences, and cultural masterpieces: film. WikiProject Film was first created in September 2003, though the project's homepage wasn't filled out until the following year. With around 500 members, it is one of the largest wikiprojects on the site. It boasts over 225 pieces of featured content, 626 good articles, and still has three A-class articles from a long-shuttered review process. We talked with Erik, Favre1fan93, Corvoe, NinjaRobotPirate, and Lugnuts.
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
Author | Pnina Fichman and Noriko Hara |
---|---|
Language | English |
Subject | Wikipedia |
Publisher | Rowman & Littlefield |
Publication date | 2014 |
Publication place | United States |
Pages | 178 |
ISBN | 978-0810891012 |
An edited volume[1] by Pnina Fichman and Noriko Hara from Indiana University, Bloomington was released on May 23, 2014, subtitled "International and Cross-cultural Issues in Online Collaboration". The book description states that "dozens of books about Wikipedia are available, but they all focus on the English Wikipedia and assume an Anglo-Saxon perspective, while disregarding cultural and language variability or multi-cultural collaborative efforts". The description claims that this is "the first book to address this gap by focusing attention on the global, multilingual, and multicultural aspects of Wikipedia." The book contains nine chapters authored by 16 Wikipedia researchers (including a chapter authored by the volume editors). Among the topics covered are international and cross-cultural conflict and collaboration, case studies in the Chinese, Finnish, French, and Greek Wikipedias, and Wikipedia gender gaps in different language sites.
Review by Maximilianklein (talk)
This research by Eom et al.[2] is an exploratory data analysis of figures (roughly, "people") from a mining of date and place of birth and gender in biography articles. Presenting novel ideas based on the infamous Google PageRank algorithm, this paper is a sort of computational history. The methods used are standard – if not a bit dated – compared with more contemporary research using Wikidata. This is a shame because newer techniques would have allowed the claims of a quantified cultural influence factor to rest on firmer grounds.
Their method is for each of their 24 Wikipedia languages (approximately the top 24 largest ones) to construct the network where nodes are biography articles, and links are intrawiki-links. Then they rank each node by both PageRank and 2DRank. PageRank says your importance is a recursive function of your incoming links, weighted by the page rank of each incoming linker; CheiRank is the same as PageRank, but using outgoing links instead. 2DRank is a mixture of PageRank and CheiRank. Some of the authors have coauthored earlier papers that similarly examined PageRank and CheiRank for biographical and other Wikipedia articles (see our previous coverage: "How Wikipedia's Google matrix differs for politicians and artists" and "Multilingual ranking analysis: Napoleon and Michael Jackson as Wikipedia's 'global heroes'").
However, the input to these algorithms is the weak part. The base set consists of all of the articles that are in a subcategory of Biographies of Living People, Births by Year, or Deaths by Year. Obtaining 1.1 million biography articles, they acknowledge that this isn't a full set because it is based off English Wikipedia, but then make an anecdotal claim that it's only 2% off. However, with the latest Wikidata information we know of at least 2.08 million "people" with Wikipedia articles[3].
The rest of their method consists of finding the top 100 articles in each of the 24 languages using both PageRank and 2DRank. Then they get birth place, birthdate and gender from DBpedia if available, and if not they look up this information manually. They pigeonhole each article into one of the 24 target cultures based on birth place, and use a "World" category if none applies. Simplifying assumptions are also made during these processes: modern borders are used, and each country is assumed to speak only a single language. So Kant is Russian and all Belgians speak Dutch in this research.
There is an exploratory analysis of these top 100 by geography, time, and gender. The results confirm a long-told story: the biographies that the English Wikipedia knows about are heavily skewed towards being Western/European, modern, and male. They make points of showing local favour, e.g. Hindi has many in their top 100 who are born in India. With regard to history, the authors note that the Arabic Wikipedia is more interested in history than what world growth would suppose. Another measure is defined to look at the localness factor by decade – that is, what percentage of top figures in this decade were born in this language-place? Of course it's Greeks early on, and the US dominating later.
On gender, their results indicate 5.1% or 10.1% by PageRank and 2DRank, respectively, are female of the top 100s, averaged. The authors make mention that maleness does decrease over time as well. This reported figure is more severe than the overlap with any single language, so the authors show some "wisdom of the crowds" effect.
The final analysis tries to quantify cultural influence. A "network of cultures" is made, where nodes are each of the 24 languages-cum-cultures, and the directed, weighted edges are the number of foreigners in their top 100. For instance, in the English Wikipedia's top 100, five people were born in France; so England connects to France with a weight of 5. With this "network of cultures" in hand, they apply the PageRank and 2DRank algorithms to rank each culture. This is a novel approach to making statistical what we all often guess at. Even despite the fact that Jesus is considered Arabic through their simplifications, PageRank turns up English and German as top and runner-up, respectively. Using 2DRank, Greek, French and Russian get more due.
In summary, although this cultural research suffers from biased data, some clever ideas are implemented – particularly the "network of cultures". The implication is that statistical history somewhat corroborates the opinions of manually conducted history.
This article[4] describes IntelWiki, a set of MediaWiki tools designed to facilitate new editor's engagement by making research easier. The tool "automatically generates resource recommendations, ranks the references based on the occurrence of salient keywords, and allows users to interact with the recommended references within the Wikipedia editor." The researchers find that volunteers using this tool were more productive, contributing more high-quality text. The studied group was composed of 16 editors with no Wikipedia editing experience, who completed two editing tasks in a sandbox wiki, one using a mockup Wikipedia editing interface and Google search engine, and using the IntelWiki interface and reference search engine. The author's reference suggestion tool seems valuable, unfortunately this reviewer was unable to locate any proof that the developer engaged the Wikipedia community, or made his code or the tool publicly available for further testing. The research and the thesis does not discuss the differences between their MediaWiki clone and Wikipedia in any significant details. Based on the limited description, the study's overall conclusions may not be reliable, since the mockup Wikipedia interface used for the comparison seems to be a default MediaWiki clone, lacking many Wikipedia-specific tools; therefore the theme of comparing IntelWiki to Wikipedia is a bit misleading.
While the study is interesting, it is disappointing that the main purpose appears to be completing a thesis,[5] with little thought to actually improving Wikipedia (by developing public tools and/or releasing open code). (See also: related webpage, YouTube video)
This paper [6] (accepted for presentation at OpenSym 2014, and subtitled "A case study of information engagement") explores the use of the Chinese Wikipedia and Baidu Baike encyclopedia by Chinese microblog (Twitter, Sina Weibo) users through qualitative and quantitative analyses of Chinese microblog postings. Both encyclopedias are often cited by microblog users, and are very popular in China to the extent that the words "wiki" and "baidu" have become verbs meaning to look up content on the respective websites, analogous to "to google" in English.
One of the study's major focuses is the impact of Internet censorship in China; particularly since Wikipedia is not censored – but access to it, and its discussion in most Chinese websites may be. Baidu Baike is both censored and more likely to host copyright violating content. Despite Baidu Baike's copyright violating content, many users still prefer the uncensored and more reliable Chinese Wikipedia, though they can become frustrated by not being able to access it due to censorship. Whether some Wikipedia content is censored or not is seen by some as a measure of the topic's political sensitivity. The author suggests that a distinguishing characteristic can be observed between groups that prefer one encyclopedia over the other, but does not discuss this in detail, suggesting a very interesting research avenue.
Review by Kimaus
In a recent paper[7], Jacob Solomon and Rick Wash investigate the question of sustainability in online communities by analysing trends in the growth of WikiProjects. Solomon and Wash track revisions and membership in over one thousand WikiProjects over a period of five years to examine how the concept of a critical mass can influence a community’s development. The key question being, as the title of the paper states: “Critical mass of what?” Is it achieving a certain number of contributions or a certain number of members that will ensure the future sustainability of an online group?
Using critical mass theory, which describes groups as having an accelerating, linear or decelerating production function, the authors modelled a growth curve for each community. They found that the majority of WikiProjects had an accelerating growth regarding the number of revisions, however a decelerating growth in accruing members which suggests that existing editors are increasing individual contributions to the projects. In further examining this trend Solomon and Wash focus on the early years of projects’ existence to determine whether amassing content or editors in this formative period influences future production functions.
Their modelling shows that a greater number and diversity of editors within a project positively affects the number of revisions accumulated after five years (where diversity is calculated through membership in other WikiProjects). Interestingly, the modelling showed contributions by infrequent participants helped a project grow, but this can be offset by "overparticipation from a project’s power users." They attribute this to members' feeling that they can make a difference to projects that have diverse and sparse contributions. They do note, however, that increased contributions from power users may simply be an attempt to keep a project afloat, and that this effort is ultimately futile in certain cases. In sum, the authors find that it is a critical mass of people (who hold a variety of skills and knowledge) contributing small amounts in the early stages that positively affects a project’s growth and future sustainability.
In a paper[8] presented at the ChASM Workshop of WebSci'14, Bloomington, Indiana, this month, de Silva and Compton, have generalised a method, previously introduced by Mestyán, Yasseri, and Kertész (see the newsletter review) to predict the box office revenues of movies based on the Wikipedia edits and page-view counts. Of these two metrics, the new paper considers only the page-view statistics of articles about the movies, but extends the sample of movies to include non-American movies as well. Samples of movies in the US, Japan, Australia, the UK, and Germany are studied. The authors concluded: "although the method proposed by Mestyán et al. predicts films’ opening weekend box office revenues in the United States and Australia with reasonable accuracy, its performance drops significantly when applied to various foreign markets. ... we used the model to predict the opening weekend box office revenues generated by films in British, Japanese, and German theatres, [and] found its accuracy to be far from satisfactory."
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.