Whisper-large-v3-turbo is used to transcribe short-form audio files and is designed to be compatible with OpenAI's sequential long-form transcription algorithm. It’s a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on >5M hours of labeled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning. Whisper-large-v3-turbo is a fine-tuned version of a pruned Whisper large-v3, with the number of decoding layers reduced from 32 to 4. As a result, the model improves transcription speed while causing minimal degradation in accuracy. See [paper] (https://arxiv.org/abs/2311.00430) for more information.
This model version is optimized to run with NVIDIA TensorRT-LLM.
This model is ready for commercial use.
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA (Whisper-Large-v3-turbo) Model Card. .
GOVERNING TERMS: Use of the model is governed by the NVIDIA Community Model License (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-community-models-license/). ADDITIONAL INFORMATION: MIT license.
Global
Developers or end users for speech transcription use cases.
03/07/2025
Whisper website
Whisper paper:
@misc{radford2022robust,
title={Robust Speech Recognition via Large-Scale Weak Supervision},
author={Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
year={2022},
eprint={2212.04356},
archivePrefix={arXiv},
primaryClass={eess.AS}
}
Architecture Type: Transformer (Encoder-Decoder)
Network Architecture: Whisper
Input Type(s): Audio, Text-Prompt
Input Format(s): Linear PCM 16-bit 1 channel (Audio), String (Text Prompt)
Input Parameters: One-Dimensional (1D), One-Dimensional (1D)
Other Properties Related to Input: Audio duration: (0 to 30 sec), prompt tokens: (5 to 114 tokens)
Output Type(s): Text Output Format: String Output Parameters: 1D
##Software Integration: *Runtime Engine:
Supported Hardware Microarchitecture Compatibility:
Supported Operating System(s): Linux
Large-v3-turbo: Whisper large-v3-turbo has the same architecture as the large-v3 models, except for the following minor differences:
For more details on model usage, evaluation, training dataset and implications, please refer to Whisper large-v3-turbo Model Card.
Data Collection Method by dataset: [Hybrid: Human, Automatic]
Labeling Method by dataset: [Automated]
Additional details on model evaluations can be found here.
Engine: Tensor(RT)-LLM, Triton
Test Hardware:
Please review the Whisper-Large-v3-Turbo Model Card for more information regarding limitations. The publisher (Open AI) has included cautions against certain uses under "Evaluated Use" and highlighted the model's limitations under "Performance and Limitations" and "Broader Implications.”
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns here.