Wikipedia talk:Wikipedia Signpost/2017-09-25/Recent research

Discuss this story

... "History of New York City" ... describes a topic very closely related to “New York City” and could at the same time easily be merged into the original article. This way of splitting up lengthy articles into several smaller ones ("summary style", more specifically "article size") may improve readability for human users, but seriously impairs many studies based on the “article-as-concept” assumption.

Does it? The split isn't solely based off of prose size but is proportional to coverage in secondary sources. I could write a history of a small town too, but that wouldn't warrant a split if the sourcing is all primary (e.g., if no secondary sources have conceptually addressed the history of the town), so we'd pare the history section of the town's article down to due weight. The history of NYC, though, has many reams of books written on it. (I'm actively parsing several books on the history of NYC's schools specifically in the 1960s...) Perhaps this is better explained in the talk itself, but as for the summary, splits such as the "history of NYC" should be seen as separate concepts from NYC itself, not only content forks. And besides, embedded in the idea of a split is the practical concern that the amount that can be reliably written on the topic extends past what a general audience would want to read in the context of the given article. czar 02:29, 25 September 2017 (UTC)[reply]

Absolutely. Blithely assuming that a "History of X" article is the same as "X" is entirely unsafe. A topic of sufficient size can often have subsidiary articles on topics such as history, methods, cultural connections, and so on depending on its type. In the case of a major city with a large history, the primary article may contain a summarized history, with an article giving much more detail. Lists of books or films featuring the city would also rightly be subsidiary articles, not at all desirable in the primary article, even though they would unquestionably be "about" the city.

On a different point, if there is an unlinked article with "New York City" in its title, it cannot be difficult for a script to detect and propose a likely connection as a subsidiary article. "Vampires of New York City" (if it existed) for instance would presumably feature that city as an involved participant. Chiswick Chap (talk) 08:07, 27 September 2017 (UTC)[reply]

Blithely assuming that a "History of X" article is the same as "X" is entirely unsafe.

Where did I blithely assume this? My point was that "History of NYC" should be judged by sourcing specific to the topic intersection (on its own merits) and based on the reams of books specific to NYC's history, the split becomes appropriate. It's having the ability to determine when a subtopic is itself the subject of significant coverage, and not simply the result of a size split. czar 13:52, 27 September 2017 (UTC)[reply]

Erm, I wasn't addressing you, and I agree with your comments both above and below mine. Chiswick Chap (talk) 14:01, 27 September 2017 (UTC)[reply]

They aren't criticizing our decision to split. The issue is that some science/research projects use Wikipedia as a massive useful database of information. The feed it into software for automatic analysis. Their simplistic initial assumption was basically "every city has an article, and all information about the city is in that article". That works perfectly for the article "Tinytown, Ohio". The history of Tinytown is in that article, and they want it included. However they are now noticing that they haven't been pulling together all information about New York City. They are surprised and disappointed that their software is failing to include "History of New York City" in with the other New York City information. From their point of view, their New York City results are incomplete or biased. They understand and accept that any machine analysis is going to have flaws. From their point of view gathering subarticles in with the parent articles generally gives better results, even if the software occasionally screws up and incorporates an incorrect article. Alsee (talk) 10:51, 4 October 2017 (UTC)[reply]