UTF-7

UTF-7
Language(s)International
StandardRFC 2152
ClassificationUnicode Transformation Format, ASCII armor, variable-width encoding, stateful encoding
Transforms / EncodesISO/IEC 10646 (Unicode)
Preceded byHZ-GB-2312
Succeeded byUTF-8 over 8BITMIME

UTF-7 (7-bit Unicode Transformation Format) is an obsolete variable-length character encoding for representing Unicode text using a stream of ASCII characters. It was originally intended to provide a means of encoding Unicode text for use in Internet E-mail messages that was more efficient than the combination of UTF-8 with quoted-printable.

UTF-7 (according to its RFC) isn't a "Unicode Transformation Format", as the definition can only encode code points in the BMP (the first 65536 Unicode code points, which does not include emojis and many other characters). However if a UTF-7 translator is to/from UTF-16 then it can (and probably does)[citation needed] encode each surrogate half as though it was a 16-bit code point, and thus can encode all code points. It is unclear if other UTF-7 software (such as translators to UTF-32 or UTF-8) support this.

UTF-7 has never been an official standard of the Unicode Consortium. It is known to have security issues, which is why software has been changed to disable its use.[1] It is prohibited in HTML 5.[2][3]

  1. ^ Cite error: The named reference dotnet5 was invoked but never defined (see the help page).
  2. ^ "8.2.2.3. Character encodings". HTML 5.1 Standard. W3C.
  3. ^ "12.2.3.3 Character encodings". HTML Living Standard. WHATWG.