Standard | Unicode Standard |
---|---|
Classification | Unicode Transformation Format, extended ASCII, variable-length encoding |
Extends | ASCII |
Transforms / Encodes | ISO/IEC 10646 (Unicode) |
Preceded by | UTF-1 |
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit.[1] Almost every webpage is stored in UTF-8.
UTF-8 is capable of encoding all 1,112,064[2] valid Unicode scalar values using a variable-width encoding of one to four one-byte (8-bit) code units. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. It was designed for backward compatibility with ASCII: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that a UTF-8-encoded file using only those characters is identical to an ASCII file. Most software designed for any extended ASCII can read and write UTF-8 (including on Microsoft Windows) and this results in fewer internationalization issues than any alternative text encoding.[3][4]
Microsoft GDK
was invoked but never defined (see the help page).whatwg
was invoked but never defined (see the help page).