User:Monkbot/Task 6: CS1 language support

Monkbot task 6 was created to modify CS1 citations that have |title= parameters containing non-Latin to use the new CS1 parameter |script-title=.

A recent change to Module:Citation/CS1 (the engine underlying the Citation Style 1 templates) created a new parameter |script-title=. The new parameter is intended to be used when a citation's title is written in a script that is not a Latin-based alphabet. Usually these scripts should not be italicized (Chinese, Japanese, etc.) and/or may be written right-to-left (Hebrew, Persian, etc.). |script-title= is supported by all citation templates that use Module:Citation/CS1 except {{cite encyclopedia}}. As of revision b, task 6 does not modify {{cite encyclopedia}} templates.

The purpose of the {{xx icon}} templates is to identify for readers that certain links are to sources that are not English language sources. Each of these {{xx icon}} templates adds the page to the appropriate subcategory of Category:Articles with non-English-language external links. Prior to the 11 October 2014 update to Module:Citation/CS1, CS1 templates with |language= parameters also added pages to the individual subcategories in Category:Articles with non-English-language external links. Because CS1 citations do not always provide links to external sources, citations that used |language= to identify the language in which the source is written were improperly categorizing the article. Module:Citation/CS1 now uses Category:CS1 foreign language sources. Task 6 locates CS1 citation templates that are adjacent to {{xx icon}} templates, adds a |language= parameter with the language code from the {{xx icon}} template to the CS1 citation and then deletes the {{xx icon}} template.

Task 6 was initially created to work on pages listed in certain subcategories of Category:Articles with non-English-language external links. The criteria are: subcategories that contain 1,000 or more articles; or subcategories for languages that have a ISO639-1 two-character language code that are listed at right-to-left. The first was an arbitrary cutoff, the second was not.

Task 6 begins by changing {{xx icon}} redirects to that standard form. For example, {{Da}}, {{Da li}}, {{Da-icon}}, and {{Dk icon}} are all redirects to and so are changed to {{da icon}}. The purpose of the standardization is to simplify later rules in the script.

After {{xx icon}} standardization, task 6:

  1. protects certain {{xx icon}} templates from further edits;
  2. moves {{xx icon}} templates that are inside a CS1 citation template to a position ahead of the CS1 template for processing by later rules;
  3. removes empty |language= parameters from CS1 citations so that the citation doesn't end up with duplicate |language= parameters at the end of the task;
  4. removes wikilink markup from |language= parameter values so that Module:Citation/CS1 can properly categorize the citation;
  5. removes |language=English, |language=British English, |language=en, or |language=en-GB from CS1 citations that use them. discontinued at task 6n;
  6. from task 6n: modifies |language=English language, |language=British English to |language=English; modifies |language=en-GB to |language=en

Some citations have |language= parameters that contain RFC1766-style language codes (code-subcode where code is an ISO639-1 language code and subcode is an ISO3166 country code. CS1 does not support this style of language parameter. Task 6 truncates these codes to just the ISO639-1 portion. Chinese is written in both simplified and traditional forms. Where |language=simplified Chinese or |language=traditional Chinese parameters occur, task 6 removes the qualifier. Where |language= contains a language name followed by the word language (|language}German language=), task 6 removes the qualifier.

In a CS1 citation, |language= may either precede or follow |title= with or without intervening parameters. To properly evaluate each citation then requires a rule for each case. Alternately, multiple rules are not needed if each citation is modified to a standard format. In this case, editors generally place |language= somewhere after |title=. Task 6 modifies those citation templates where |language= precedes |title= by moving |language= to the end of the citation (same place it puts |language= parameters that are created from {{xx icon}} templates).

Certain citations shouldn't be edited. Task 6 employs a multilevel protection scheme. Edits to protected elements are prevented by the insertion of a special text string that makes the template unrecognizable to subsequent rules. Elements that include either of the special text strings __PROTECTED__ and __PROTECTED2__, are never edited by task 6 except to remove the protection string at the task's completion. Reasons for this level of protection are:

  1. a citation with leading or trailing {{xx icon}} templates contains |language=<value> where the {{xx icon}} code (xx) or the code's equivalent language name does not match the language name or code in |language=; where there is a match, {{xx icon}} is removed;
  2. the citation includes another template; especially templates like {{nihongo}} which can confuse the later rules;
  3. groups of two or more {{xx icon}} or {{xxx icon}} templates, the first and last are protected to prevent later rules from taking one of them as a value for a citation's |language= parameter.
  4. {{en icon}} when amongst other {{xx icon}} or {{xxx icon}} templates; it is presumed that such use indicates a multilingual source;

The second level of protection is applied only after the first level protection rules have been applied. This level identifies CS1 citations that have |title= values containing one or more Latin characters. The script is not smart enough to know if these characters are part of the original writing system, are a transliteration, or are a translation. Under certain circumstances described later, task 6 may edit those citations marked with __PROTECED1__.

Unprotected {{en icon}} templates are then deleted.

For each of the rtl languages, the CJK languages, other non-Latin scripts (Greek, Hebrew, Cyrillic), and in keeping with MOS:Foriegn terms, special rules require that the content of |title= must match the language identified in {{xx icon}} or |language=. For example, the rule for Arabic requires an {{ar icon}} or |language=ar or |language=Arabic and that |title= contain only punctuation, digits (0–9), and Arabic script. When these conditions are met, task 6 replaces |title=... with |script-title=ar:..., adds |language=ar (if appropriate) and deletes the adjacent {{ar icon}} template (if present).

Languages for which task 6 supports |script-title= are:

when |language=divehi, |language=dhivehi, |language=maldivian, |language=dv; when citation has adjacent {{dv icon}}, |language= parameter must be |language=Maldivian or |language=dv;


For those languages that use Latin or Latin-variant alphabets, task 6 simply adds |language=xx and deletes the adjacent {{xx icon}} template.

Where those CS1 citations with Latin characters in |title=, and which now contain __PROTECTED1__, task 6 deletes the icon and adds |language=xx to the citation.

As a final step, wherever task 6 added __PROTECTED__, __PROTECTED1__, and __PROTECTED2__, that text is removed.

From 18 April 2015‎ Module:Citation/CS1 supports a comma delimited list of language names. From Rev. o, task 6 will locate cs1|2 templates followed by two to five {{xx icon}} templates and add the codes from those template to a |language= parameter.

Hidden under the hood at Module:Citation/CS1 is the process that takes |title=transcription, |script-title=xx:original writing system title, and |trans-title=translated title and puts them all together with <bdi lang="xx">...</bdi> which both isolates the content for rtl languages and helps the browser to correctly display the script.

If, at the end of all of this, only casing has been changed ({{XX icon}} to {{xx icon}}) then the change is not saved.

Article pages that contain {{bots|Monkbot 6}} or that do not contain Module:Citation/CS1-supported templates will not be edited by this task.