CMU Pronouncing Dictionary

CMU Pronouncing Dictionary
Developer(s)Carnegie Mellon University
Stable release
0.7b / November 19, 2014; 9 years ago (2014-11-19)
Available inEnglish
LicenseBSD
Websitewww.speech.cs.cmu.edu/cgi-bin/cmudict

The CMU Pronouncing Dictionary (also known as CMUdict) is an open-source pronouncing dictionary originally created by the Speech Group at Carnegie Mellon University (CMU) for use in speech recognition research.

CMUdict provides a mapping orthographic/phonetic for English words in their North American pronunciations. It is commonly used to generate representations for speech recognition (ASR), e.g. the CMU Sphinx system, and speech synthesis (TTS), e.g. the Festival system. CMUdict can be used as a training corpus for building statistical grapheme-to-phoneme (g2p) models[1] that will generate pronunciations for words not yet included in the dictionary.

The most recent release is 0.7b; it contains over 134,000 entries. An interactive lookup version is available.[2]

  1. ^ "Sequitur G2P - A trainable Grapheme-to-Phoneme converter".
  2. ^ "The CMU Pronouncing Dictionary". CMU Pronouncing Dictionary. 2015-07-16. Archived from the original on 2022-06-03. Retrieved 2022-06-04.