Whisper (speech recognition system)

Whisper (speech recognition system)
Original author(s)	OpenAI
Initial release	September 21, 2022
Repository	https://github.com/openai/whisper
Written in	Python
Type	Transcription software; Encoder-decoder transformer; Foundation model; Acoustic model;
License	MIT License

Whisper is a machine learning model for speech recognition and transcription, created by OpenAI and first released as open-source software in September 2022.^[2]

It is capable of transcribing speech in English and several other languages, and is also capable of translating several non-English languages into English.^[1] OpenAI claims that the combination of different training data used in its development has led to improved recognition of accents, background noise and jargon compared to previous approaches.^[3]

Whisper is a weakly-supervised deep learning acoustic model, made using an encoder-decoder transformer architecture.^[1]

Whisper V2 was released on December 8, 2022.^[4] Whisper V3 was released in November 2023, on the OpenAI Dev Day.^[5]

^ ^a ^b ^c Radford, Alec; Kim, Jong Wook; Xu, Tao; Brockman, Greg; McLeavey, Christine; Sutskever, Ilya (2022-12-06). "Robust Speech Recognition via Large-Scale Weak Supervision". arXiv:2212.04356 [eess.AS].
^ Golla, Ramsri Goutham (2023-03-06). "Here Are Six Practical Use Cases for the New Whisper API". Slator. Archived from the original on 2023-03-25. Retrieved 2023-08-12.
^ Wiggers, Kyle (September 21, 2022). "OpenAI open-sources Whisper, a multilingual speech recognition system". TechCrunch. Archived from the original on February 12, 2023. Retrieved February 12, 2023.
^ "Announcing the large-v2 model · openai/whisper · Discussion #661". GitHub. Retrieved 2024-01-08.
^ OpenAI DevDay: Opening Keynote, retrieved 2024-01-08

[paper-1] Radford, Alec; Kim, Jong Wook; Xu, Tao; Brockman, Greg; McLeavey, Christine; Sutskever, Ilya (2022-12-06). "Robust Speech Recognition via Large-Scale Weak Supervision". arXiv:2212.04356 [eess.AS].

[2] Golla, Ramsri Goutham (2023-03-06). "Here Are Six Practical Use Cases for the New Whisper API". Slator. Archived from the original on 2023-03-25. Retrieved 2023-08-12.

[3] Wiggers, Kyle (September 21, 2022). "OpenAI open-sources Whisper, a multilingual speech recognition system". TechCrunch. Archived from the original on February 12, 2023. Retrieved February 12, 2023.

[4] "Announcing the large-v2 model · openai/whisper · Discussion #661". GitHub. Retrieved 2024-01-08.

[5] OpenAI DevDay: Opening Keynote, retrieved 2024-01-08

[1]

[2]

[3]

[4]

[5]

Original author(s)	OpenAI^[1]
Initial release	September 21, 2022
Repository	https://github.com/openai/whisper
Written in	Python
Type	Transcription software Encoder-decoder transformer Foundation model Acoustic model
License	MIT License