Arabic script in Unicode

Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms. In English, the common ampersand (&) developed from a ligature in which the handwritten Latin letters e and t (spelling et, Latin for and) were combined.[1] The rules governing ligature formation in Arabic can be quite complex, requiring special script-shaping technologies such as the Arabic Calligraphic Engine by Thomas Milo's DecoType.[2]

As of Unicode 16.0, the Arabic script is contained in the following blocks:[3]

The basic Arabic range encodes the standard letters and diacritics, but does not encode contextual forms (U+0621–U+0652 being directly based on ISO 8859-6); and also includes the most common diacritics and Arabic-Indic digits. The Arabic Supplement range encodes letter variants mostly used for writing African (non-Arabic) languages. The Arabic Extended-B and Arabic Extended-A ranges encode additional Qur'anic annotations and letter variants used for various non-Arabic languages. The Arabic Presentation Forms-A range encodes contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. The Arabic Presentation Forms-B range encodes spacing forms of Arabic diacritics, and more contextual letter forms. The presentation forms are present only for compatibility with older standards, and are not currently needed for coding text.[4] The Arabic Mathematical Alphabetical Symbols block encodes characters used in Arabic mathematical expressions. The Indic Siyaq Numbers block contains a specialized subset of Arabic script that was used for accounting in India under the Mughal Empire by the 17th century through the middle of the 20th century.[5][6] The Ottoman Siyaq Numbers block contains a specialized subset of Arabic script, also known as Siyakat numbers, used for accounting in Ottoman Turkish documents.[6]

  1. ^ "What is the origin of the ampersand (&)?"
  2. ^ unicode.org Biography: Thomas Milo - DecoType
  3. ^ "UAX #24: Script data file". Unicode Character Database. The Unicode Consortium.
  4. ^ "Section 9.2: Arabic, Arabic Presentation Forms-B". The Unicode Standard. The Unicode Consortium. September 2024.
  5. ^ Pandey, Anshuman (2015-11-05). "L2/15-121R2: Proposal to Encode Indic Siyaq Numbers" (PDF).
  6. ^ a b "Chapter 22: Symbols". Unicode, Inc. September 2024.