NVIDIA
NVIDIA
DLRM for PyTorch
Resource
NVIDIA
NVIDIA
DLRM for PyTorch

The Deep Learning Recommendation Model (DLRM) is a recommendation model designed to make use of both categorical and numerical inputs.

Changelog

October 2021

  • Added support for CUDA Graphs
  • Switched to PyTorch native AMP for mixed precision training
  • Unified the single-GPU and multi-GPU training scripts
  • Added support for BYO dataset
  • Updated performance results
  • Updated container version

June 2021

  • Updated container version
  • Updated performance results

March 2021

  • Added NVTabular as a new preprocessing option
  • Added a new dataset - xlarge, which uses a frequency threshold of 2
  • Introduced a new GPU - A100 80GB, and its performance results
  • Updated Spark preprocessing
  • Added Adam as an optional optimizer for embedding and MLPs, for multi-GPU training
  • Improved README

August 2020

  • Preprocessing with Spark 3 on GPU
  • Multiple performance optimizations
  • Automatic placement and load balancing of embedding tables
  • Improved README

June 2020

  • Updated performance tables to include A100 results and multi-GPU setup
  • Multi-GPU optimizations

May 2020

  • Performance optimizations

April 2020

  • Initial release

Known issues

  • Adam optimizer performance is not optimized.
  • For some seeds, the model's loss can become NaN due to aggressive learning rate schedule.
  • Custom dot interaction kernels for FP16 and TF32 assume that embedding size <= 128 and number of categorical variables < 32. Pass --interaction_op dot to use the slower native operation in those cases.