NGC | Catalog
Welcome Guest
CatalogResourcesSE(3)-Transformers for PyTorch

SE(3)-Transformers for PyTorch

For downloads and more information, please view on a desktop device.
Logo for SE(3)-Transformers for PyTorch

Description

A Graph Neural Network using a variant of self-attention for 3D points and graphs processing.

Publisher

NVIDIA

Use Case

Other

Framework

PyTorch

Latest Version

21.07.2

Modified

November 12, 2021

Compressed Size

1.3 MB

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA's latest software release. For the most up-to-date performance measurements, go to NVIDIA Data Center Deep Learning Product Performance.

Benchmarking

The following section shows how to run benchmarks measuring the model performance in training and inference modes.

Training performance benchmark

To benchmark the training performance on a specific batch size, run bash scripts/benchmark_train.sh {BATCH_SIZE} for single GPU, and bash scripts/benchmark_train_multi_gpu.sh {BATCH_SIZE} for multi-GPU.

Inference performance benchmark

To benchmark the inference performance on a specific batch size, run bash scripts/benchmark_inference.sh {BATCH_SIZE}.

Results

The following sections provide details on how we achieved our performance and accuracy in training and inference.

Training accuracy results

Training accuracy: NVIDIA DGX A100 (8x A100 80GB)

Our results were obtained by running the scripts/train.sh training script in the PyTorch 21.07 NGC container on NVIDIA DGX A100 (8x A100 80GB) GPUs.

GPUs Batch size / GPU Absolute error - TF32 Absolute error - mixed precision Time to train - TF32 Time to train - mixed precision Time to train speedup (mixed precision to TF32)
1 240 0.03456 0.03460 1h23min 1h03min 1.32x
8 240 0.03417 0.03424 15min 12min 1.25x
Training accuracy: NVIDIA DGX-1 (8x V100 16GB)

Our results were obtained by running the scripts/train.sh training script in the PyTorch 21.07 NGC container on NVIDIA DGX-1 with (8x V100 16GB) GPUs.

GPUs Batch size / GPU Absolute error - FP32 Absolute error - mixed precision Time to train - FP32 Time to train - mixed precision Time to train speedup (mixed precision to FP32)
1 240 0.03432 0.03439 2h25min 1h33min 1.56x
8 240 0.03380 0.03495 29min 20min 1.45x

Training performance results

Training performance: NVIDIA DGX A100 (8x A100 80GB)

Our results were obtained by running the scripts/benchmark_train.sh and scripts/benchmark_train_multi_gpu.sh benchmarking scripts in the PyTorch 21.07 NGC container on NVIDIA DGX A100 with 8x A100 80GB GPUs. Performance numbers (in molecules per millisecond) were averaged over five entire training epochs after a warmup epoch.

GPUs Batch size / GPU Throughput - TF32 [mol/ms] Throughput - mixed precision [mol/ms] Throughput speedup (mixed precision - TF32) Weak scaling - TF32 Weak scaling - mixed precision
1 240 2.21 2.92 1.32x
1 120 1.81 2.04 1.13x
8 240 15.88 21.02 1.32x 7.18 7.20
8 120 12.68 13.99 1.10x 7.00 6.86

To achieve these same results, follow the steps in the Quick Start Guide.

Training performance: NVIDIA DGX-1 (8x V100 16GB)

Our results were obtained by running the scripts/benchmark_train.sh and scripts/benchmark_train_multi_gpu.sh benchmarking scripts in the PyTorch 21.07 NGC container on NVIDIA DGX-1 with 8x V100 16GB GPUs. Performance numbers (in molecules per millisecond) were averaged over five entire training epochs after a warmup epoch.

GPUs Batch size / GPU Throughput - FP32 [mol/ms] Throughput - mixed precision [mol/ms] Throughput speedup (FP32 - mixed precision) Weak scaling - FP32 Weak scaling - mixed precision
1 240 1.25 1.88 1.50x
1 120 1.03 1.41 1.37x
8 240 8.68 12.75 1.47x 6.94 6.78
8 120 6.64 8.58 1.29x 6.44 6.08

To achieve these same results, follow the steps in the Quick Start Guide.

Inference performance results

Inference performance: NVIDIA DGX A100 (1x A100 80GB)

Our results were obtained by running the scripts/benchmark_inference.sh inferencing benchmarking script in the PyTorch 21.07 NGC container on NVIDIA DGX A100 with 1x A100 80GB GPU.

FP16

Batch size Throughput Avg [mol/ms] Latency Avg [ms] Latency 90% [ms] Latency 95% [ms] Latency 99% [ms]
1600 11.60 140.94 138.29 140.12 386.40
800 10.74 75.69 75.74 76.50 79.77
400 8.86 45.57 46.11 46.60 49.97

TF32

Batch size Throughput Avg [mol/ms] Latency Avg [ms] Latency 90% [ms] Latency 95% [ms] Latency 99% [ms]
1600 8.58 189.20 186.39 187.71 420.28
800 8.28 97.56 97.20 97.73 101.13
400 7.55 53.38 53.72 54.48 56.62

To achieve these same results, follow the steps in the Quick Start Guide.

Inference performance: NVIDIA DGX-1 (1x V100 16GB)

Our results were obtained by running the scripts/benchmark_inference.sh inferencing benchmarking script in the PyTorch 21.07 NGC container on NVIDIA DGX-1 with 1x V100 16GB GPU.

FP16

Batch size Throughput Avg [mol/ms] Latency Avg [ms] Latency 90% [ms] Latency 95% [ms] Latency 99% [ms]
1600 6.42 254.54 247.97 249.29 721.15
800 6.13 132.07 131.90 132.70 140.15
400 5.37 75.12 76.01 76.66 79.90

FP32

Batch size Throughput Avg [mol/ms] Latency Avg [ms] Latency 90% [ms] Latency 95% [ms] Latency 99% [ms]
1600 3.39 475.86 473.82 475.64 891.18
800 3.36 239.17 240.64 241.65 243.70
400 3.17 126.67 128.19 128.82 130.54

To achieve these same results, follow the steps in the Quick Start Guide.