The NeMo Audio Codec model is a non-autoregressive convolutional encoder-quantizer-decoder model for coding or tokenization of raw audio signals or mel-spectrogram features. The NeMo Audio Codec model supports residual vector quantizer (RVQ) and finite scalar quantizer (FSQ) for quantization of the encoder output. This model is trained end-to-end using generative loss, discriminative loss, and reconstruction loss, similar to other neural audio codecs such as SoundStream and EnCodec.
Architecture Type: Convolutional encoder-quantizer-decoder model
Audio codes
Audio of shape (batch x time) in wav format
Runtime Engine(s): Riva 2.18.0 or greater
Supported Hardware Platform(s):
Supported Operating System(s):