General Category (Unicode Character Property)[a] | |||||
---|---|---|---|---|---|
Value | Category Major, minor | Basic type[b] | Character assigned[b] | Count[c] (as of 16.0) |
Remarks |
L, Letter; LC, Cased Letter (Lu, Ll, and Lt only)[d] | |||||
Lu | Letter, uppercase | Graphic | Character | 1,858 | |
Ll | Letter, lowercase | Graphic | Character | 2,258 | |
Lt | Letter, titlecase | Graphic | Character | 31 | Ligatures or digraphs containing an uppercase followed by a lowercase part (e.g., Dž, Lj, Nj, and Dz) |
Lm | Letter, modifier | Graphic | Character | 404 | A modifier letter |
Lo | Letter, other | Graphic | Character | 136,477 | An ideograph or a letter in a unicase alphabet |
M, Mark | |||||
Mn | Mark, nonspacing | Graphic | Character | 2,020 | |
Mc | Mark, spacing combining | Graphic | Character | 468 | |
Me | Mark, enclosing | Graphic | Character | 13 | |
N, Number | |||||
Nd | Number, decimal digit | Graphic | Character | 760 | All these, and only these, have Numeric Type = De[e] |
Nl | Number, letter | Graphic | Character | 236 | Numerals composed of letters or letterlike symbols (e.g., Roman numerals) |
No | Number, other | Graphic | Character | 915 | E.g., vulgar fractions, superscript and subscript digits, vigesimal digits |
P, Punctuation | |||||
Pc | Punctuation, connector | Graphic | Character | 10 | Includes spacing underscore characters such as "_", and other spacing tie characters. Unlike other punctuation characters, these may be classified as "word" characters by regular expression libraries.[f] |
Pd | Punctuation, dash | Graphic | Character | 27 | Includes several hyphen characters |
Ps | Punctuation, open | Graphic | Character | 79 | Opening bracket characters |
Pe | Punctuation, close | Graphic | Character | 77 | Closing bracket characters |
Pi | Punctuation, initial quote | Graphic | Character | 12 | Opening quotation mark. Does not include the ASCII "neutral" quotation mark. May behave like Ps or Pe depending on usage |
Pf | Punctuation, final quote | Graphic | Character | 10 | Closing quotation mark. May behave like Ps or Pe depending on usage |
Po | Punctuation, other | Graphic | Character | 640 | |
S, Symbol | |||||
Sm | Symbol, math | Graphic | Character | 950 | Mathematical symbols (e.g., +, −, =, ×, ÷, √, ∊, ≠). Does not include parentheses and brackets, which are in categories Ps and Pe. Also does not include !, *, -, or /, which despite frequent use as mathematical operators, are primarily considered to be "punctuation". |
Sc | Symbol, currency | Graphic | Character | 63 | Currency symbols |
Sk | Symbol, modifier | Graphic | Character | 125 | |
So | Symbol, other | Graphic | Character | 7,376 | |
Z, Separator | |||||
Zs | Separator, space | Graphic | Character | 17 | Includes the space, but not TAB, CR, or LF, which are Cc |
Zl | Separator, line | Format | Character | 1 | Only U+2028 LINE SEPARATOR (LSEP) |
Zp | Separator, paragraph | Format | Character | 1 | Only U+2029 PARAGRAPH SEPARATOR (PSEP) |
C, Other | |||||
Cc | Other, control | Control | Character | 65 (will never change)[e] | No name,[g] <control> |
Cf | Other, format | Format | Character | 170 | Includes the soft hyphen, joining control characters (ZWNJ and ZWJ), control characters to support bidirectional text, and language tag characters |
Cs | Other, surrogate | Surrogate | Not (only used in UTF-16) | 2,048 (will never change)[e] | No name,[g] <surrogate> |
Co | Other, private use | Private-use | Character (but no interpretation specified) | 137,468 total (will never change)[e] (6,400 in BMP, 131,068 in Planes 15–16) | No name,[g] <private-use> |
Cn | Other, not assigned | Noncharacter | Not | 66 (will not change unless the range of Unicode code points is expanded)[e] | No name,[g] <noncharacter> |
Reserved | Not | 819,467 | No name,[g] <reserved> | ||
|
General Category is a Unicode character property, defined in Chapter 4 of the Unicode Standard: "Character Properties".
{{General Category (Unicode)|state=}}