Code page 932 (Microsoft Windows)

Windows Code page 932
MIME / IANAWindows-31J
Alias(es)CP943C
Language(s)Japanese
StandardWHATWG Encoding Standard (as "Shift_JIS")[1]
ClassificationExtended ASCII,[a] variable-width encoding, CJK encoding
ExtendsShift_JIS
  1. ^ Not in the strictest sense of the term, as ASCII bytes can appear as trail bytes.

Microsoft Windows code page 932 (abbreviated MS932,[2][3] Windows-932[3] or ambiguously CP932[4]), also called Windows-31J amongst other names (see § Terminology below), is the Microsoft Windows code page for the Japanese language, which is an extended variant of the Shift JIS Japanese character encoding. It contains standard 7-bit ASCII codes, and Japanese characters are indicated by the high bit of the first byte being set to 1. Some code points in this page require a second byte, so characters use either 8 or 16 bits for encoding.

IBM offer the same extended double-byte codes in their code page 943 (IBM-943 or CP943),[5] which is a combination of the single-byte Code page 897 and the double-byte Code page 941.[6]

Windows-31J is the most used non-UTF-8/Unicode Japanese encoding on the web. However, many people and software packages, including Microsoft libraries,[7] declare the Shift JIS encoding for Windows-31J data, although it includes some additional characters, and some of the existing characters are mapped to Unicode differently. This has led the WHATWG HTML standard to treat the encoding labels shift_jis and windows-31j interchangeably, and use the Windows variant for its "Shift_JIS" encoder and decoder.[1]

  1. ^ a b Mozilla Foundation. "Notable Differences from IANA Naming". Crate encoding_rs. docs.rs.
  2. ^ Sivonen, Henri. "Bug 27851 - Add MS932 as a label of Shift_JIS". w3.org Bug Tracker.
  3. ^ a b "Converter Explorer: ibm-943_P15A-2003 (alias windows-31j)". International Components for Unicode: ICU Demonstration.
  4. ^ Aoki, Osamu. "Chapter 11. Data conversion". Debian Reference. Debian.
  5. ^ "IBM-943 and IBM-932". IBM Knowledge Center. IBM.
  6. ^ "Coded character set identifiers - CCSID 943". IBM Globalization. IBM. Archived from the original on 2016-03-15.
  7. ^ Cite error: The named reference msdnlabels was invoked but never defined (see the help page).