Standard | Unicode Standard |
---|---|
Classification | Unicode Transformation Format, extended ASCII, variable-length encoding |
Extends | ASCII |
Transforms / Encodes | ISO/IEC 10646 (Unicode) |
Preceded by | UTF-1 |
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit.[1] Almost every web page is stored in UTF-8.
UTF-8 is capable of encoding all 1,112,064[2] valid Unicode code points using a variable-width encoding of one to four one-byte (8-bit) code units. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. It was designed for backward compatibility with ASCII: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that a UTF-8-encoded file using only those characters is identical to an ASCII file, and most software designed for any extended ASCII can read and write UTF-8. Using UTF-8 results in fewer internationalization issues than any alternative text encoding,[3][4] virtually all software can at least read and write UTF-8 text (including on Microsoft Windows) and it is the most-used method of storing text, accounting for 98.3% of all web pages, 99.1% of the top 100,000 pages, and up to 100% for many languages, as of 2024[update].[5]
Microsoft GDK
was invoked but never defined (see the help page).whatwg
was invoked but never defined (see the help page).W3TechsWebEncoding
was invoked but never defined (see the help page).