Letter frequency

Letter Relative frequency in the English language[1]
Texts Dictionaries[citation needed]
A 8.2% 8.2
 
7.8% 7.8
 
B 1.5% 1.5
 
2.0% 2
 
C 2.8% 2.8
 
4.0% 4
 
D 4.3% 4.3
 
3.8% 3.8
 
E 12.7% 12.7
 
11.0% 11
 
F 2.2% 2.2
 
1.4% 1.4
 
G 2.0% 2
 
3.0% 3
 
H 6.1% 6.1
 
2.3% 2.3
 
I 7.0% 7
 
8.6% 8.6
 
J 0.15% 0.15
 
0.21% 0.21
 
K 0.77% 0.77
 
0.97% 0.97
 
L 4.0% 4
 
5.3% 5.3
 
M 2.4% 2.4
 
2.7% 2.7
 
N 6.7% 6.7
 
7.2% 7.2
 
O 7.5% 7.5
 
6.1% 6.1
 
P 1.9% 1.9
 
2.8% 2.8
 
Q 0.095% 0.095
 
0.19% 0.19
 
R 6.0% 6
 
7.3% 7.3
 
S 6.3% 6.3
 
8.7% 8.7
 
T 9.1% 9.1
 
6.7% 6.7
 
U 2.8% 2.8
 
3.3% 3.3
 
V 0.98% 0.98
 
1.0% 1
 
W 2.4% 2.4
 
0.91% 0.91
 
X 0.15% 0.15
 
0.27% 0.27
 
Y 2.0% 2
 
1.6% 1.6
 
Z 0.074% 0.074
 
0.44% 0.44
 

Letter frequency is the number of times letters of the alphabet appear on average in written language. Letter frequency analysis dates back to the Arab mathematician Al-Kindi (c. 801–873 AD), who formally developed the method to break ciphers. Letter frequency analysis gained importance in Europe with the development of movable type in 1450 AD, where one must estimate the amount of type required for each letterform. Linguists use letter frequency analysis as a rudimentary technique for language identification, where it is particularly effective as an indication of whether an unknown writing system is alphabetic, syllabic, or ideographic.

The use of letter frequencies and frequency analysis plays a fundamental role in cryptograms and several word puzzle games, including hangman, Scrabble, Wordle[2] and the television game show Wheel of Fortune. One of the earliest descriptions in classical literature of applying the knowledge of English letter frequency to solving a cryptogram is found in Edgar Allan Poe's famous story "The Gold-Bug", where the method is successfully applied to decipher a message giving the location of a treasure hidden by Captain Kidd.[3][citation needed]

Herbert S. Zim, in his classic introductory cryptography text Codes and Secret Writing, gives the English letter frequency sequence as "ETAON RISHD LFCMU GYPWB VKJXZQ", the most common letter pairs as "TH HE AN RE ER IN ON AT ND ST ES EN OF TE ED OR TI HI AS TO", and the most common doubled letters as "LL EE SS OO TT FF RR NN PP CC".[4] Different ways of counting can produce somewhat different orders.

Letter frequencies also have a strong effect on the design of some keyboard layouts. The most frequent letters are placed on the home row of the Blickensderfer typewriter, the Dvorak keyboard layout, Colemak and other optimized layouts.

  1. ^ Mička, Pavel. "Letter frequency (English)". Algoritmy.net. Archived from the original on 4 March 2021. Retrieved 14 June 2022. Source is Leland, Robert. Cryptological mathematics. [s.l.] : The Mathematical Association of America, 2000. 199 p. ISBN 0-88385-719-7
  2. ^ Guinness, Harry. "The Best Starting Words to Win at Wordle". Wired. ISSN 1059-1028. Retrieved 2022-02-12.
  3. ^ Poe, Edgar Allan. "The works of Edgar Allan Poe in five volumes". Project Gutenberg.
  4. ^ Zim, Herbert Spencer (1961). Codes & Secret Writing: Authorized Abridgement. Scholastic Book Services. OCLC 317853773.