Developer(s) | RegExTypoFix Team (see below for RETF Developers) |
---|---|
Written in | Regular expressions |
Operating system | Platform independent |
Available in | English |
Type | Spell checkers |
License | GFDL/CC BY-SA |
Top contributors (200+ edits, as of February 2024[update])
Also notable (100+ kb, as of February 2024[update]) Original creator |
These are the typo regular expressions for RegExTypoFix (Regular Expression Typographical error Fixer, or RETF). Development has been open to the public since 2006.
Please add to or improve these regular expressions!
These regular expressions find and fix common misspellings and grammatical errors. The primary advantage of RegExTypoFix over other possible spellchecking engines and approaches is accuracy and the return of only one possible replacement. The rules below are developed to give as few false positives as possible. Errors should be encountered only in extremely rare usages or when parsing other languages (though even then if there are too many false positives the expression will be modified). On everyday English, accuracy should hit 100%.
RegExTypoFix is used across diverse sources of text from many languages, in the English Wikipedia. RegExTypoFix is also used on other MediaWiki-based wikis, and derivatives can be leveraged in other software. This leads to a massively tested, well-vetted set of automatic corrections. Even so, due to the great variability of text, RegExTypoFix is not accurate enough to be run without a human checking every proposed correction when running against an encyclopedia such as Wikipedia.
Syntax of the expressions is described in full on the MSDN website, though for the purposes of this page the Well House summary is likely easier to use.
Everyone using RegExTypoFix should use it responsibly. Check every edit before you make it. If in doubt, SKIP. This typo list is used by the in-browser editor and multiple Wikipedia tools.
AWB purposely avoids fixing typos in certain areas of the wiki-text. Typo fixing is prevented within: image names, template names and parameters, wikilink targets, text in quotations and italics, and any text that follows a colon or asterisk. If a typo rule matches a wikilink target, this rule will be ignored on the whole page.
When using AWB, you can refresh the typo list by selecting "File → Refresh status/typos" (CTRL-R). This is useful when you are modifying the typo list on Wikipedia while using AWB to test/process the modification (but basic testing should first be done offline—e.g. by using AWB's Regex Tester or "Find and replace").
The JavaScript Wiki Browser uses the same rules for ignoring typo fixing as the downloadable AWB does. Additionally, JWB will ignore any typo that occurs on the same line of text as {{sic
in order to avoid fixing intentional or transcribed typos. Other than that, the typo rules will not be applied to image names, template names and parameters, quotes, and any text following a colon or asterisk, as well as skipping any rule that also matches a wikilink target on that page. Due to some browsers not supporting lookbehinds, any replacement rules containing lookbehinds (?<=
and ?<!
) will be ignored on those specific browsers. Any browser that does support these rules will apply them as normal.
To refresh the typo list, simply click the right next to the checkbox for enabling the Typo Fixing.
WPCleaner also purposely avoids fixing typos in certain areas of the wiki-text. Since Java supports lookbehinds a bit differently than C#, any replacement rules containing lookbehinds (?<=
and ?<!
) will be rejected if the lookbehind expression doesn't have an obvious maximum length (for example, if the lookbehind expression is using quantifiers like *
or +
, it will probably be rejected)
.
Rules starting with \{\{
are only applied on the beginning of templates, rules starting with \[\[
are only applied on the beginning of internal links. For other rules, typo fixing is prevented within:
[[link|description]]
,[[xx:link|description]]
,[http://xxxx/ description]
,<gallery>...</gallery>
, <math>...</math>
, <code>...</code>
or <timeline>...</timeline>
tags,When using WPCleaner, you can refresh the typo list by clicking on the button in the main window.
On Wikipedia gadget wikEd, the rules are applied everywhere.
The syntax for each rule is the following (according to AWB and WikEd source code):
<Typo word="Optional name for this rule" find="Regex code to detect the error" replace="Replacement for the error"/>
The "word" parameter is optional and any additional spaces between the parameters are ignored.
\b
) to both ends of the regex unless you are matching errors in parts of words or multiple words.*
and +
with anything but a single character. Avoid them entirely if possible, as they put extra strain on CPU and are apt to do other than what you expect.?<=
and ?<!
are not supported by wikEd, and JavaScript Wiki Browser in some web browsers (notably Firefox & Safari as of October 2019[update]), and could cause these rules to be skipped.To-do list for Wikipedia:AutoWikiBrowser/Typos:
References
|
All changes to this list are live. AWB loads directly from this list whenever someone invokes the RETF option.