The OpenAI Whisper Models
In late 2022, OpenAI released the Whisper series of audio transcription models. Since then, they have quickly become the go-to open-source models used in numerous deployed applications. There has also been a series of updates and follow-on work aiming to improve their speed and accuracy.
How do these new Whisper models compare to the originals? We decided to explore two variations:
- Distil Whisper - A distilled version of Whisper that is 6x faster, smaller, and similarly performant to the base Whisper models.
- Whisper Large v3 - An updated Whisper version trained on a larger corpus of data.
This report is a follow-up to our first transcription report, which looked at Whisper's performance across demographic groups. We use the same Speech Accent Archive dataset for this analysis, a dataset of speakers from around the world saying the same linguistically diverse phrase. If you want to explore this data further, take a look at the Whisper Accents Project.