Wikipedia talk:Wikipedia Signpost/2015-12-09/Op-ed

Discuss this story

  • So Wikidata is explicitly intended to accommodate fringe views side-by-side with mainstream views, with no differentiation of which of these views are reliable other than by the reader's own ability to distinguish the quality of sources? There goes any hope of generating Wikipedia content such as infoboxes automatically from Wikidata, for any but the most dry and uncontroversial of topics. Is this biography about a physicist or a crank pseudoscientist? Can't say, sources differ. Is this herbal treatment efficacious in treating certain diseases? Can't say, sources differ. Is this city's English name spelled Kiev or Kyiv? Can't say, sources differ. Is the Riemann hypothesis an open problem in mathematics or already solved? Can't say, sources differ. Was Much Ado About Nothing written by Shakespeare, or by an entirely different person coincidentally named Shakespeare? Can't say, sources differ. So, other than replacing interwiki links, what is all the data in Wikidata actually good for? —David Eppstein (talk) 20:48, 12 December 2015 (UTC)[reply]
It's inadequate for a scientific or political debate - for that, you shoudl always look at the sources. The three ranks are designed to allow for a simple selection of the rough level of certainty you want in a given context. "Preferred" is what you would want to see in an infobox, or what you want to include when compiling a list (of the largest cities or whatever). The "normal" rank would be used for historical values (population in 1927) or minority views, and can be used for more in-depth queries or more detailed display. "Deprecated" is for "known wrong" statements - known fallacies, popular misconceptions, etc. Statements with the "deprecated" rank are rarely used, and mainly serve as a safeguard against such statements being introduced as valid.
Chalsea Manning's gender is a good example: "female" and "trans woman" are both marked as "preferred" - both views are popular and well founded, and should be presented side by side (they don't contradict each other either, but that's not the point here, they might as well). "male" is left with the "normal" rank, since it used to be true, but no longer is. This is further specified with the "end date" qualifier, telling us that Manning used to be male until August 22 2013. For queries and infoboxes, this should be enough information. For scientific or political analysis, you'll of course have to dig deeper.
Wikidata is designed to be flexible and useful, it's founded on the idea that knowledge is intrinsically imprecise, incomplete, and context dependant. Wikidata doesn't claim to represent "the truth" accurately - it just tries to represent other people's statements about the world in a useful and neutral way. Just like Wikipedia. -- Daniel Kinzler (WMDE) (talk) 13:28, 13 December 2015 (UTC)[reply]
"Preferred, normal and deprecated." Boy, can I imagine some edit wars around that. "Jerusalem = Capital of Israel"? "Preferred." "Jerusalem = Capital of Palestine"? "Deprecated." (This is not a description of the current (protected) Wikidata entry on Jerusalem, in which both statements presently have "normal" ranking.) It would be better to list the best-quality sources for each of several conflicting statements, make sure that re-users display those sources, and allow readers to decide for themselves which sources they want to trust.
Incidentally, the Wikidata statement "Jerusalem = Capital of Israel" is sourced to the Wikidata item for Israel, where the CIA Factbook is given as a reference. However, the CIA Factbook says, under the heading "Capital": "Jerusalem: note - Israel proclaimed Jerusalem as its capital in 1950, but the US, like all other countries, maintains its embassy in Tel Aviv". Similarly, the German Foreign Office says, "Capital (not recognised internationally): Jerusalem." That nuance, i.e. the lack of international recognition, does not make it across to Wikidata. Maybe you need a "Proclaimed capital" statement in Wikidata, followed by a list of sources who do or do not recognise it as such. (The second list will be very, very much longer than the first.) Andreas JN466 15:35, 13 December 2015 (UTC)[reply]
@Jayen466: This is not a fatality. We have created a qualifier(s) for those kind of usecases : "claim disputed by", and we can add some more if needed. see d:Property:P1310. And we can create more to add nuances to claims. This can be used to define, for example, autoproclaim states by saying the UN does not recognize them as states. TomT0m (talk) 13:05, 14 December 2015 (UTC)[reply]
That's good, TomT0m. I have a request: could I ask you to go into the data item and do the necessary adjustment? The protection level doesn't allow me access at present. In my previous post above I mentioned two sources you could cite (CIA Factbook and German Foreign Office); a more authoritative source not tied to any individual state might be this United Nations Department of Public Information publication: [1] (see the chapter: The status of Jerusalem). Andreas JN466 15:51, 14 December 2015 (UTC)[reply]
So far the tools we have implemented seem to actually be working rather well. For all I can tell we do not have bad edit wars because tools like ranks and qualifiers are actually rather powerful. --Lydia Pintscher (WMDE) (talk) 17:00, 14 December 2015 (UTC)[reply]
  • Like many Wikipedians, I suspect, I have a lot to learn about Wikidata, so thank you for this clear, readable explanation. My main concern is with the data's reliability. Regarding, "We have already seen [the number of referenced statements] increase massively from 12.7% to 20.9% over the past year because of these measures as well as a change in attitude," I'm curious about the change in attitude; can you elaborate on that, perhaps pointing to public discussions exemplifying the evolving attitude toward citations, please? --Anthonyhcole (talk · contribs · email) 21:24, 12 December 2015 (UTC)[reply]
    • Great to hear you found it helpful! I have a hard time pointing out specific things. It is more something I am seeing in many places and how it changed since the beginning of Wikidata. We started out with an empty database. Then a lot of boot-strapping happened in large part with the help of data already in Wikipedia. The need for this boot-strapping is going away now. Now instead we're seeing a shift towards working more with outside sources for data imports for example. There have been several collaborations with GLAMs as part of the WikiProject Sum Of All Paintings. Or collaborations with research institutions as part of Wikidata for Research. I also mentioned the reworking of the process for highlighting quality content. It is a long process but I think now that Wikidata is finding its feet firmly on the ground we're on the right track. Hope that answers your question at least in part. --Lydia Pintscher (WMDE) (talk) 17:06, 14 December 2015 (UTC)[reply]
  • It would be wise to split up Wikidata into different language versions, as all other Wikimedia projects. Wikidata cannot be compared to Commons because pictures etc. are obviously different from the data we speak about here. The world is one, but it also falls into many cultural spheres, and experience from almost fifteen years of Wikipedia is that there is not one Wikipedia, but there are almost 200 of them. So, get real, please, and draw the line.--Aschmidt (talk) 19:34, 13 December 2015 (UTC)[reply]
    • That would severely undermine one of the main reasons Wikidata was created in the first place: to help small Wikipedias in order to give more people more access to more knowledge. --Lydia Pintscher (WMDE) (talk) 17:08, 14 December 2015 (UTC)[reply]
      • As I said, it would be wise to accept that it turned out that this plan has failed because you will not solve cultural issues with technology, as most nerds are apt to to. All attempts like this have failed in the past. Wikidata might be the project that taught us so in terms of all things Wikimedia.--Aschmidt (talk) 23:42, 14 December 2015 (UTC)[reply]