NVIDIA
NVIDIA
SDG Workshop
Container
NVIDIA
NVIDIA
SDG Workshop

Synthetic Data Generation Workshop container for building realistic synthetic datasets based on existing data using Megatron-LM.

Sign in to access all content for this ContainerSigning in will also allow download accessSign In

Synthetic Data Generation Workshop container

This container is for use with the Synthetic Data Generation Workshop. Once the container is pulled, it can be run using the following command:

docker run --gpus all -it --rm -p 8888:8888 -p 6006:6006 sdg_workshop:3.1 jupyter lab --NotebookApp.token ''

Port 8888 is for running the Jupyter server and port 6006 is for viewing TensorBoard during model training.

This container has an end-to-end workflow involving:

  • Data prep and ETL
  • GPT2 Model training with Megatron-LM
  • Inference
  • Evaluation of Synthetic Data

Synthetic Data Generation is a data augmentation technique and is necessary for increasing the robustness of models by supplying additional data to train models.

An ideal synthetic dataset generated on top of real data is a dataset that shares with the real data:

  • the same features (columns)

for a particular feature, they share the same data type (integer, float, string, etc)

  • the same distributions in an individual column
  • the same joint distributions when considering multiple columns
  • the same conditional distributions (i.e. applying a condition on one distribution and looking at another)

A synthetic data generator is a model that can be trained on the real data, and then be utilized to create new synthetic data with the properties described above.

Publisher
NVIDIA
NVIDIA
Latest Tag1
UpdatedAugust 2, 2022 UTC
Compressed Size13.57 GB
Multinode SupportNo
Multi-Arch SupportNo

NVIDIA uses cookies to improve your experience on our web site. We and our third-party partners also use cookies and other tools to collect and record information you provide as well as information about your interactions with our websites for performance improvement, analytics, and to assist in marketing efforts. By clicking "Accept All", you consent to our use of cookies and other tools as described in our Cookie Policy. You can manage your cookie settings by clicking on "Manage Settings." By continuing to use this site or by clicking one of the buttons below, you agree to our Terms of Service (which contains important waivers). Please see our Privacy Policy for more information on our privacy practices.