UTF-8

UTF-8
Standard	Unicode Standard
Classification	Unicode Transformation Format, extended ASCII, variable-length encoding
Extends	ASCII
Transforms / Encodes	ISO/IEC 10646 (Unicode)
Preceded by	UTF-1
	v; t; e;

UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit.^[1] Almost every webpage is stored in UTF-8.

UTF-8 is capable of encoding all 1,112,064^[2] valid Unicode scalar values using a variable-width encoding of one to four one-byte (8-bit) code units. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. It was designed for backward compatibility with ASCII: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that a UTF-8-encoded file using only those characters is identical to an ASCII file. Most software designed for any extended ASCII can read and write UTF-8 (including on Microsoft Windows) and this results in fewer internationalization issues than any alternative text encoding.^[3]^[4]

^ "Chapter 2. General Structure". The Unicode Standard (6.0 ed.). Mountain View, California, US: The Unicode Consortium. ISBN 978-1-936213-01-6.
^ "Conformance". The Unicode Standard (6.0 ed.). Mountain View, California, US: The Unicode Consortium. D76 Unicode scalar value. ISBN 978-1-936213-01-6. - 17 planes times 2¹⁶ code points per plane, minus 2¹¹ technically-invalid surrogates
^ Cite error: The named reference Microsoft GDK was invoked but never defined (see the help page).
^ Cite error: The named reference whatwg was invoked but never defined (see the help page).

[1] "Chapter 2. General Structure". The Unicode Standard (6.0 ed.). Mountain View, California, US: The Unicode Consortium. ISBN 978-1-936213-01-6.

[2] "Conformance". The Unicode Standard (6.0 ed.). Mountain View, California, US: The Unicode Consortium. D76 Unicode scalar value. ISBN 978-1-936213-01-6. - 17 planes times 2¹⁶ code points per plane, minus 2¹¹ technically-invalid surrogates

[Microsoft_GDK-3] Cite error: The named reference Microsoft GDK was invoked but never defined (see the help page).

[whatwg-4] Cite error: The named reference whatwg was invoked but never defined (see the help page).

[1]

[2]

[3]

[4]