Speaker Diarization: TitaNet Model Card
TitaNet is a novel neural network architecture for extracting speaker representations.
TitaNet employs 1D depth-wise separable convolutions with Squeeze-and-Excitation (SE) layers with global context followed by channel attention based statistics pooling layer to map variable-length utterances to a fixed-length embedding (t-vector) .
These models were trained on a composite dataset comprising of several thousand hours of speech, compiled from various publicly available sources. The NeMo toolkit  was used for training this model over few hundred epochs on multiple GPUs.
How to Use this Model
To use this model, we can use Riva Skills Quick start guide, it is a starting point to try out Riva models. Information regarding Quick start guide can be found : here. To use Riva Speech ASR service using this model, document has the necessary information.
This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
This model provides embeddings of size 192 from a speaker for a given audio sample.
By downloading and using the models and resources packaged with Riva Conversational AI, you would be accepting the terms of the Riva license