To train your model using mixed or TF32 precision with Tensor Cores or using FP32, perform the following steps using the default parameters of the Jasper model on the Librispeech dataset. For details concerning training and inference, see [Advanced](#Advanced) section. 1. Clone the repository. ```bash git clone https://github.com/NVIDIA/DeepLearningExamples cd DeepLearningExamples/PyTorch/SpeechRecognition/Jasper ``` 2. Build the Jasper PyTorch container. Running the following scripts will build and launch the container which contains all the required dependencies for data download and processing as well as training and inference of the model. ```bash bash scripts/docker/build.sh ``` 3. Start an interactive session in the NGC container to run data download/training/inference ```bash bash scripts/docker/launch.sh ``` Within the container, the contents of this repository will be copied to the `/workspace/jasper` directory. The `/datasets`, `/checkpoints`, `/results` directories are mounted as volumes and mapped to the corresponding directories ``, ``, `` on the host. 4. Download and preprocess the dataset. No GPU is required for data download and preprocessing. Therefore, if GPU usage is a limited resource, launch the container for this section on a CPU machine by following Steps 2 and 3. Note: Downloading and preprocessing the dataset requires 500GB of free disk space and can take several hours to complete. This repository provides scripts to download, and extract the following datasets: * LibriSpeech [http://www.openslr.org/12](http://www.openslr.org/12) LibriSpeech contains 1000 hours of 16kHz read English speech derived from public domain audiobooks from LibriVox project and has been carefully segmented and aligned. For more information, see the [LIBRISPEECH: AN ASR CORPUS BASED ON PUBLIC DOMAIN AUDIO BOOKS](http://www.danielpovey.com/files/2015_icassp_librispeech.pdf) paper. Inside the container, download and extract the datasets into the required format for later training and inference: ```bash bash scripts/download_librispeech.sh ``` Once the data download is complete, the following folders should exist: ```bash datasets/LibriSpeech/ |-- dev-clean |-- dev-other |-- test-clean |-- test-other |-- train-clean-100 |-- train-clean-360 |-- train-other-500 ``` Since `/datasets/` is mounted to `` on the host (see Step 3), once the dataset is downloaded it will be accessible from outside of the container at `/LibriSpeech`. Next, convert the data into WAV files: ```bash bash scripts/preprocess_librispeech.sh ``` Once the data is converted, the following additional files and folders should exist: ```bash datasets/LibriSpeech/ |-- dev-clean-wav |-- dev-other-wav |-- librispeech-train-clean-100-wav.json |-- librispeech-train-clean-360-wav.json |-- librispeech-train-other-500-wav.json |-- librispeech-dev-clean-wav.json |-- librispeech-dev-other-wav.json |-- librispeech-test-clean-wav.json |-- librispeech-test-other-wav.json |-- test-clean-wav |-- test-other-wav |-- train-clean-100-wav |-- train-clean-360-wav |-- train-other-500-wav ``` The DALI data pre-processing pipeline, which is enabled by default, performs speed perturbation on-line during training. Without DALI, on-line speed perturbation might slow down the training. If you wish to disable DALI, speed perturbation can be computed off-line with: ```bash SPEEDS="0.9 1.1" bash scripts/preprocess_librispeech.sh ``` 5. Start training. Inside the container, use the following script to start training. Make sure the downloaded and preprocessed dataset is located at `/LibriSpeech` on the host (see Step 3), which corresponds to `/datasets/LibriSpeech` inside the container. ```bash [OPTION1=value1 OPTION2=value2 ...] bash scripts/train.sh ``` By default automatic precision is disabled, batch size is 64 over two gradient accumulation steps, and the recipe is run on a total of 8 GPUs. The hyperparameters are tuned for a GPU with at least 32GB of memory and will require adjustment for 16GB GPUs (e.g., by lowering batch size and using more gradient accumulation steps). Options are being passed as environment variables. More details on available options can be found in [Parameters](#parameters) and [Training process](#training-process). 6. Start validation/evaluation. Inside the container, use the following script to run evaluation. Make sure the downloaded and preprocessed dataset is located at `/LibriSpeech` on the host (see Step 3), which corresponds to `/datasets/LibriSpeech` inside the container. ```bash [OPTION1=value1 OPTION2=value2 ...] bash scripts/evaluation.sh [OPTIONS] ``` By default, this will use full precision, a batch size of 64 and run on a single GPU. Options are being passed as environment variables. More details on available options can be found in [Parameters](#parameters) and [Evaluation process](#evaluation-process). 7. Start inference/predictions. Inside the container, use the following script to run inference. Make sure the downloaded and preprocessed dataset is located at `/LibriSpeech` on the host (see Step 3), which corresponds to `/datasets/LibriSpeech` inside the container. A pretrained model checkpoint can be downloaded from [NGC model repository](https://ngc.nvidia.com/catalog/models/nvidia:jasperpyt_fp16). ```bash [OPTION1=value1 OPTION2=value2 ...] bash scripts/inference.sh ``` By default this will use single precision, a batch size of 64 and run on a single GPU. Options are being passed as environment variables. More details on available options can be found in [Parameters](#parameters) and [Inference process](#inference-process).