WaveNet

WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016,[1] is able to generate relatively realistic-sounding human-like voices by directly modelling waveforms using a neural network method trained with recordings of real speech. Tests with US English and Mandarin reportedly showed that the system outperforms Google's best existing text-to-speech (TTS) systems, although as of 2016 its text-to-speech synthesis still was less convincing than actual human speech.[2] WaveNet's ability to generate raw waveforms means that it can model any kind of audio, including music.[3]

  1. ^ van den Oord, Aaron; Dieleman, Sander; Zen, Heiga; Simonyan, Karen; Vinyals, Oriol; Graves, Alex; Kalchbrenner, Nal; Senior, Andrew; Kavukcuoglu, Koray (2016-09-12). "WaveNet: A Generative Model for Raw Audio". arXiv:1609.03499 [cs.SD].
  2. ^ Kahn, Jeremy (2016-09-09). "Google's DeepMind Achieves Speech-Generation Breakthrough". Bloomberg.com. Retrieved 2017-07-06.
  3. ^ Meyer, David (2016-09-09). "Google's DeepMind Claims Massive Progress in Synthesized Speech". Fortune. Retrieved 2017-07-06.