NETtalk (artificial neural network)

NETtalk structure.

NETtalk is an artificial neural network. It is the result of research carried out in the mid-1980s by Terrence Sejnowski and Charles Rosenberg. The intent behind NETtalk was to construct simplified models that might shed light on the complexity of learning human level cognitive tasks, and their implementation as a connectionist model that could also learn to perform a comparable task. The authors trained it in two ways, once by Boltzmann machine and once by backpropagation.[1]

NETtalk is a program that learns to pronounce written English text by being shown text as input and matching phonetic transcriptions for comparison.[2][3]

The network was trained on a large amount of English words and their corresponding pronunciations, and is able to generate pronunciations for unseen words with a high level of accuracy. The success of the NETtalk network inspired further research in the field of pronunciation generation and speech synthesis and demonstrated the potential of neural networks for solving complex NLP problems. The output of the network was a stream of phonemes, which fed into DECtalk to produce audible speech, It achieved popular success, appearing on the Today show.[4] The development process was described in a 1993 interview. It took three months to create the training dataset, but only a few days to train the network.[5]

  1. ^ Sejnowski, Terrence J., and Charles R. Rosenberg. "Parallel networks that learn to pronounce English text." Complex systems 1.1 (1987): 145-168.
  2. ^ Thierry Dutoit (30 November 2001). An Introduction to Text-to-Speech Synthesis. Springer Science & Business Media. pp. 123–. ISBN 978-1-4020-0369-1.
  3. ^ Hinton, Geoffrey (1991). Connectionist Symbol Processing (First ed.). The MIT Press. pp. 161–163. ISBN 0-262-58106-X.
  4. ^ Sejnowski, Terrence J. (2018). The deep learning revolution. Cambridge, Massachusetts London, England: The MIT Press. ISBN 978-0-262-03803-4.
  5. ^ Talking Nets: An Oral History of Neural Networks. The MIT Press. 2000-02-28. ISBN 978-0-262-26715-1.