# Overview This is a collection of models to enable OpenVoice support for the NVIDIA In-Game Inferencing (NVIGI) SDK. Please see each overview section below for details on the following models: * BERT base model (uncased) * MelloTTS * OpenVoice Converter # Model Overview : Bert embedding ## Description: Bert-base-uncased is a text embedding model. This is used by OpenVoice Text-to-Speech (TTS) solution to generate embeddings from input text. This model is to be used with the NVIGI SDK OpenVoice TTS plugin. This model is ready for commercial and non-commercial use. ### Model Developer: Google ## Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see [BERT base model (uncased)](https://huggingface.co/google-bert/bert-base-uncased). ### License/Terms of Use: This model is distributed under Apache 2-0 [Bert License](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md). Please refer to [BERT base model (uncased) Model Card](https://huggingface.co/google-bert/bert-base-uncased) for further details. ## Reference(s): [BERT base model (uncased) Model Card](https://huggingface.co/google-bert/bert-base-uncased) ## Model Architecture: **Architecture Type:** Transformer
**Network Architecture:** BERT ## Input: **Input Type(s):** Text
**Input Format(s):** Int tokens
**Input Parameters:** 1D ## Output: **Output Type(s):** Embedding vectors
**Output Format:** Vector
**Output Parameters:** Two Dimensional (2D) **Supported Hardware Microarchitecture Compatibility:** * NVIDIA Lovelace **Supported Operating System(s):** * Linux, Windows ## Model Version(s): * BERT base model (uncased) q4_k_s 1.0 # Training and Evaluation Datasets: Training Dataset:
**Links:** [legacy-datasets/wikipedia](https://huggingface.co/datasets/legacy-datasets/wikipedia) and [bookcorpus/bookcorpus](https://huggingface.co/datasets/bookcorpus/bookcorpus)
**Data Collection Method by dataset:** Unknown
**Labeling Method by dataset:** Unknown
**Properties:** * wikipedia dataset * ~12GB of data based on wikipedia articles * More information on Hugging Face model card * Bookcorpus * 7,185 unique books * More information on Hugging Face model card Evaluation Dataset:
**Data Collection Method by dataset:** Unknown
**Labeling Method by dataset:** Unknown
**Properties:** See Model Card under “Glue Test Results.” ## Inference: **Engine:** ONNX
**Test Hardware** : RTX 4090 ## Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).

# Model Overview : MelloTTS v2 and v3 ## Description: MelloTTS is a Text-to-Speech (TTS) model used in the OpenVoice TTS solution as a base model (before doing voice conversion). The differences between V2 and V3 are only the weights. V3 has better voice (more realistic) quality but fewer speakers. This model is ready for commercial/non-commercial use. ### Model Developer: Myshell-ai ## Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see: - [myshell-ai/MeloTTS-English-v2](https://huggingface.co/myshell-ai/MeloTTS-English-v2). - [myshell-ai/MeloTTS-English-v3](https://huggingface.co/myshell-ai/MeloTTS-English-v3). ### License/Terms of Use: This model is distributed under MIT license[MelloTTS License](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md). Please refer to [MelloTTS Model Card](https://huggingface.co/myshell-ai/MeloTTS-English-v2) for further details. ## Reference(s): - [myshell-ai/MeloTTS-English-v2](https://huggingface.co/myshell-ai/MeloTTS-English-v2). - [myshell-ai/MeloTTS-English-v3](https://huggingface.co/myshell-ai/MeloTTS-English-v3). ## Model Architecture: **Architecture Type:** Transformer/Flow Network
**Network Architecture:** ViTs architecture ## Input: **Input Type(s):** phonemes, tones, embedding, speaker id
**Input Format(s):** Int, int, float, int
**Input Parameters:** 1D for all parameters ## Output: **Output Type(s):** Audio at a sampling rate of 44100
**Output Format:** Vector
**Output Parameters:** 2D **Supported Hardware Microarchitecture Compatibility:** * NVIDIA Ada **Supported Operating System(s):** * Windows, Linux ## Model Version(s): * MELLO TTS float16 v2 * MELLO TTS float16 v3 # Training, Testing, and Evaluation Datasets: **Data Collection Method by dataset:** Unknown
**Labeling Method by dataset:** Unknown
**Properties:** The dataset used to train this model is not known. The full testing dataset is also not known, however, their website features numerous sample outputs. ## Inference: **Engine:** Onnx
**Test Hardware:** RTX 4090 ## Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).

# Model Overview : OpenVoice Converter Model ## Description: The OpenVoice converter model is used in the OpenVoice TTS solution to clone the voice of a reference speaker and apply it to an output audio (output of the base model). This model is ready for commercial/non-commercial use. ### Model Developer: Myshell-ai ## Third-Party Community Consideration This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see: - [myshell-ai/OpenVoiceV2](https://huggingface.co/myshell-ai/OpenVoiceV2). ### License/Terms of Use: This model is distributed under MIT license[OpenVoice License](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/mit.md). Please refer to [myshell-ai/OpenVoiceV2](https://huggingface.co/myshell-ai/OpenVoiceV2) for further details. ## Reference(s): - [myshell-ai/OpenVoiceV2](https://huggingface.co/myshell-ai/OpenVoiceV2). - [OpenVoice paper](https://arxiv.org/pdf/2312.01479) ## Model Architecture: **Architecture Type:** Transformer/Flow Network
**Network Architecture:** Encoder-Decoder structure with an invertible normalizing flow. ## Input: **Input Type(s):** Audio at a sampling rate of 22050, Reference spectrogram
**Input Format(s):** Float, Float
**Input Parameters:** 1D ## Output: **Output Type(s):** Audio at a sampling rate of 22050
**Output Format:** Vector
**Output Parameters:** 2D **Supported Hardware Microarchitecture Compatibility:** * NVIDIA Ada **Supported Operating System(s):** * Windows, Linux ## Model Version(s): OpenVoice converter TTS float16 v2 # Training, Testing, and Evaluation Datasets: **Data Collection Method by dataset:** Unknown
**Labeling Method by dataset:** Unknown
**Properties:** The dataset used to train this model is not known. The full testing dataset is also not known, however, their website features numerous sample outputs. ## Inference: **Engine:** Onnx
**Test Hardware:** RTX 4090 ## Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).