In corpus linguistics, a hapax legomenon (/ˈhæpəks lɪˈɡɒmɪnɒn/ also /ˈhæpæks/ or /ˈheɪpæks/;[1][2] pl. hapax legomena; sometimes abbreviated to hapax, plural hapaxes) is a word or an expression that occurs only once within a context: either in the written record of an entire language, in the works of an author, or in a single text. The term is sometimes incorrectly used to describe a word that occurs in just one of an author's works but more than once in that particular work. Hapax legomenon is a transliteration of Greek ἅπαξ λεγόμενον, meaning "said once".[3]
The related terms dis legomenon, tris legomenon, and tetrakis legomenon respectively (/ˈdɪs/, /ˈtrɪs/, /ˈtɛtrəkɪs/) refer to double, triple, or quadruple occurrences, but are far less commonly used.
Hapax legomena are quite common, as predicted by Zipf's law,[4] which states that the frequency of any word in a corpus is inversely proportional to its rank in the frequency table. For large corpora, about 40% to 60% of the words are hapax legomena, and another 10% to 15% are dis legomena.[5] Thus, in the Brown Corpus of American English, about half of the 50,000 distinct words are hapax legomena within that corpus.[6]
Hapax legomenon refers to the appearance of a word or an expression in a body of text, not to either its origin or its prevalence in speech. It thus differs from a nonce word, which may never be recorded, may find currency and may be widely recorded, or may appear several times in the work which coins it, and so on.