This is a collection of models to enable OpenVoice support for the NVIDIA In-Game Inferencing (NVIGI) SDK. Please see each overview section below for details on the following models:
Bert-base-uncased is a text embedding model. This is used by OpenVoice Text-to-Speech (TTS) solution to generate embeddings from input text. This model is to be used with the NVIGI SDK OpenVoice TTS plugin.
This model is ready for commercial and non-commercial use.
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see BERT base model (uncased).
This model is distributed under Apache 2-0 Bert License. Please refer to BERT base model (uncased) Model Card for further details.
BERT base model (uncased) Model Card
Architecture Type: Transformer
Network Architecture: BERT
Input Type(s): Text
Input Format(s): Int tokens
Input Parameters: 1D
Output Type(s): Embedding vectors
Output Format: Vector
Output Parameters: Two Dimensional (2D)
Supported Hardware Microarchitecture Compatibility:
Supported Operating System(s):
Training Dataset:
Links: legacy-datasets/wikipedia and bookcorpus/bookcorpus
Data Collection Method by dataset: Unknown
Labeling Method by dataset: Unknown
Properties:
Evaluation Dataset:
Data Collection Method by dataset: Unknown
Labeling Method by dataset: Unknown
Properties: See Model Card under “Glue Test Results.”
Engine: ONNX
Test Hardware : RTX 4090
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns here.
MelloTTS is a Text-to-Speech (TTS) model used in the OpenVoice TTS solution as a base model (before doing voice conversion). The differences between V2 and V3 are only the weights. V3 has better voice (more realistic) quality but fewer speakers.
This model is ready for commercial/non-commercial use.
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see:
This model is distributed under MIT licenseMelloTTS License. Please refer to MelloTTS Model Card for further details.
Architecture Type: Transformer/Flow Network
Network Architecture: ViTs architecture
Input Type(s): phonemes, tones, embedding, speaker id
Input Format(s): Int, int, float, int
Input Parameters: 1D for all parameters
Output Type(s): Audio at a sampling rate of 44100
Output Format: Vector
Output Parameters: 2D
Supported Hardware Microarchitecture Compatibility:
Supported Operating System(s):
Data Collection Method by dataset: Unknown
Labeling Method by dataset: Unknown
Properties: The dataset used to train this model is not known. The full testing dataset is also not known, however, their website features numerous sample outputs.
Engine: Onnx
Test Hardware: RTX 4090
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns here.
The OpenVoice converter model is used in the OpenVoice TTS solution to clone the voice of a reference speaker and apply it to an output audio (output of the base model).
This model is ready for commercial/non-commercial use.
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see:
This model is distributed under MIT licenseOpenVoice License. Please refer to myshell-ai/OpenVoiceV2 for further details.
Architecture Type: Transformer/Flow Network
Network Architecture: Encoder-Decoder structure with an invertible normalizing flow.
Input Type(s): Audio at a sampling rate of 22050, Reference spectrogram
Input Format(s): Float, Float
Input Parameters: 1D
Output Type(s): Audio at a sampling rate of 22050
Output Format: Vector
Output Parameters: 2D
Supported Hardware Microarchitecture Compatibility:
Supported Operating System(s):
OpenVoice converter TTS float16 v2
Data Collection Method by dataset: Unknown
Labeling Method by dataset: Unknown
Properties: The dataset used to train this model is not known. The full testing dataset is also not known, however, their website features numerous sample outputs.
Engine: Onnx
Test Hardware: RTX 4090
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns here.