DLRM for PyTorch | NVIDIA NGC

NVIDIA

DLRM for PyTorch

Resource

NVIDIA

DLRM for PyTorch

The Deep Learning Recommendation Model (DLRM) is a recommendation model designed to make use of both categorical and numerical inputs.

Changelog

October 2021

Added support for CUDA Graphs
Switched to PyTorch native AMP for mixed precision training
Unified the single-GPU and multi-GPU training scripts
Added support for BYO dataset
Updated performance results
Updated container version

June 2021

Updated container version
Updated performance results

March 2021

Added NVTabular as a new preprocessing option
Added a new dataset - xlarge, which uses a frequency threshold of 2
Introduced a new GPU - A100 80GB, and its performance results
Updated Spark preprocessing
Added Adam as an optional optimizer for embedding and MLPs, for multi-GPU training
Improved README

August 2020

Preprocessing with Spark 3 on GPU
Multiple performance optimizations
Automatic placement and load balancing of embedding tables
Improved README

June 2020

Updated performance tables to include A100 results and multi-GPU setup
Multi-GPU optimizations

May 2020

Performance optimizations

April 2020

Initial release

Known issues

Adam optimizer performance is not optimized.
For some seeds, the model's loss can become NaN due to aggressive learning rate schedule.
Custom dot interaction kernels for FP16 and TF32 assume that embedding size <= 128 and number of categorical variables < 32. Pass --interaction_op dot to use the slower native operation in those cases.