Whisper (speech recognition system)

Whisper (speech recognition system)
Original author(s)OpenAI[1]
Initial releaseSeptember 21, 2022
Repositoryhttps://github.com/openai/whisper
Written inPython
Type
LicenseMIT License

Whisper is a machine learning model for speech recognition and transcription, created by OpenAI and first released as open-source software in September 2022.[2]

It is capable of transcribing speech in English and several other languages, and is also capable of translating several non-English languages into English.[1] OpenAI claims that the combination of different training data used in its development has led to improved recognition of accents, background noise and jargon compared to previous approaches.[3]

Whisper is a weakly-supervised deep learning acoustic model, made using an encoder-decoder transformer architecture.[1]

Whisper V2 was released on December 8, 2022.[4] Whisper V3 was released in November 2023, on the OpenAI Dev Day.[5]

  1. ^ a b c Radford, Alec; Kim, Jong Wook; Xu, Tao; Brockman, Greg; McLeavey, Christine; Sutskever, Ilya (2022-12-06). "Robust Speech Recognition via Large-Scale Weak Supervision". arXiv:2212.04356 [eess.AS].
  2. ^ Golla, Ramsri Goutham (2023-03-06). "Here Are Six Practical Use Cases for the New Whisper API". Slator. Archived from the original on 2023-03-25. Retrieved 2023-08-12.
  3. ^ Wiggers, Kyle (September 21, 2022). "OpenAI open-sources Whisper, a multilingual speech recognition system". TechCrunch. Archived from the original on February 12, 2023. Retrieved February 12, 2023.
  4. ^ "Announcing the large-v2 model · openai/whisper · Discussion #661". GitHub. Retrieved 2024-01-08.
  5. ^ OpenAI DevDay: Opening Keynote, 6 November 2023, retrieved 2024-01-08