[JAX](https://jax.readthedocs.io/) is a framework for high-performance numerical computing and machine learning research. It includes Numpy-like APIs, automatic differentiation, [XLA](https://github.com/openxla/openxla-nvgpu) acceleration and simple primitives for scaling across GPUs. The JAX NGC Container comes with all dependencies included, providing an easy place to start developing applications in areas such as NLP, Computer Vision, Multimodality, physics-based simulations, reinforcement learning, drug discovery, and neural rendering. The JAX NGC Container is optimized for GPU acceleration, and contains a validated set of libraries that enable and optimize GPU performance. This container may also include modifications to the JAX source code in order to maximize performance and compatibility. This container also includes software for accelerating ETL ([DALI](https://developer.nvidia.com/dali) and training ([cuDNN](https://developer.nvidia.com/cudnn), [NCCL](https://developer.nvidia.com/nccl)). For building neural networks, the JAX NGC Container includes [Flax](https://flax.readthedocs.io/en/latest/notebooks/flax_basics.html), a neural network library with support for common deep learning models, layers and optimizers. We also include a container for [Paxml](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/pax), a framework for training LLMs such as GPT, and a container for [T5x](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x), a framework for training T5 and other Flax-based models. You can use the JAX, Paxml, or T5x containers for your deep learning workloads or install your own favorite libraries on top of them. ## Prerequisites Using the JAX NGC Container requires the host system to have the following installed: * [Docker Engine](https://docs.docker.com/get-docker/) * [NVIDIA GPU Drivers](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html) * [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker) For supported versions, see the [Framework Containers Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html) and the [NVIDIA Container Toolkit Documentation](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). No other installation, compilation, or dependency management is required. It is not necessary to install the NVIDIA CUDA Toolkit. ## Running JAX To run a container, issue the appropriate command as explained in the [Running A Container](https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html#runcont) chapter in the NVIDIA Containers For Deep Learning Frameworks User's Guide and specify the registry, repository, and tags. For more information about using NGC, refer to the [NGC Container User Guide](https://docs.nvidia.com/ngc/ngc-catalog-user-guide/index.html). The following command assumes you want to run the JAX container interactively, where 23.08 is the container version. For example, 23.08 for August 2023 release: ``` docker run --gpus all -it --rm nvcr.io/nvidia/jax:23.08-py3 ``` To pull data and model descriptions from locations outside the container for use by JAX or save results to locations outside the container, mount one or more host directories as [Docker data volumes](https://docs.docker.com/engine/admin/volumes/volumes/). See [/workspace/README.md](https://github.com/google/jax/blob/main/README.md) for information on getting started and customizing your JAX image. If you use multiprocessing for multi-threaded data loaders, the default shared memory segment size that the container runs with might not be enough. To increase the shared memory size, issue one of the following commands: ``` --shm-size=1g --ulimit memlock=-1 ``` Note: In order to share data between ranks, NCCL may require shared system memory for IPC and pinned (page-locked) system memory resources. The operating system's limits on these resources may need to be increased accordingly. In particular, Docker containers default to limited shared and pinned memory resources. When using NCCL inside a container, it is recommended that you increase these resources. ## Running JAX in multi-node, multi-GPU One of the key features of JAX is its easy scaling primitives for running JAX processes across multiple accelerators. The JAX distributed system allows JAX processes to discover each other and share topology information, perform health checks, ensure that all processes shut down if any process dies, and can be used for distributed checkpointing. For information on how to set up a cluster and launch JAX processes, please refer to [JAX readthedocs](https://jax.readthedocs.io/en/latest/multi_process.html). For HPC cluster environments with a Slurm or OpenMPI scheduler, the [jax.distributed.initialize()](https://jax.readthedocs.io/en/latest/_autosummary/jax.distributed.initialize.html#jax.distributed.initialize) API will automatically detect all available JAX processes. ## What Is In This Container? This container image contains the complete source of the NVIDIA version of JAX in /opt/jax. It is prebuilt and installed as a system Python module. Visit JAX's [readthedocs](https://jax.readthedocs.io/en/latest/) page to learn more about JAX. The NVIDIA JAX Container is optimized for use with NVIDIA GPUs, and contains the following software for GPU acceleration: * [CUDA](https://developer.nvidia.com/cuda-toolkit) * [cuBLAS](https://developer.nvidia.com/cublas) * [NVIDIA cuDNN](https://developer.nvidia.com/cudnn) * [NVIDIA NCCL](https://developer.nvidia.com/nccl) (optimized for [NVLink](https://www.nvidia.com/object/nvlink.html)) * [NVIDIA DALI](https://developer.nvidia.com/dali) The software stack in this container has been validated for compatibility, and does not require any additional installation or compilation from the end user. This container can help accelerate your deep learning workflow from end to end. ### ETL [NVIDIA Data Loading Library (DALI)](https://developer.nvidia.com/dali) is designed to accelerate data loading and preprocessing pipelines for deep learning applications by offloading them to the GPU. DALI primarily focuses on building data preprocessing pipelines for image, video, and audio data. These pipelines are typically complex and include multiple stages, leading to bottlenecks when run on CPU. Use this container to [get started](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html) on accelerating data loading with DALI. ### Training [NVIDIA CUDA Deep Neural Network Library (cuDNN)](https://developer.nvidia.com/cudnn) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. The version of JAX in this container is precompiled with cuDNN support, and does not require any additional configuration. [NVIDIA Collective Communications Library (NCCL)](https://developer.nvidia.com/nccl) implements multi-GPU and multi-node communication primitives for NVIDIA GPUs and Networking that take into account system and network topology. NCCL is integrated with JAX to accelerate training on multi-GPU and multi-node systems. In particular, NCCL provides the default all-reduce algorithm for the Mirrored and MultiWorkerMirrored distributed training strategies. ## Suggested Reading For the latest Release Notes, see the [JAX Release Notes](https://docs.nvidia.com/deeplearning/frameworks/jax-release-notes/index.html). For a full list of the supported software and specific versions that come packaged with this framework based on the container image, see the [Frameworks Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html). For more information about JAX, including tutorials, documentation, and examples, see: * [JAX website](https://jax.readthedocs.io/en/latest/notebooks/quickstart.html) * [JAX Toolbox](https://www.bing.com/search?pglt=673&q=jax+toolbox+github&cvid=9085b307e786471d972cd2da360afa9f&aqs=edge.0.69i59j69i57j0l3j69i60l3j69i64.1464j0j1&FORM=ANNTA1&PC=U531) ### JAX on Public Clouds * AWS * [SageMaker GPT2 distributed training code sample](https://github.com/aws-samples/aws-samples-for-ray/tree/main/sagemaker/jax_alpa_language_model) * Azure * [Accelerating AI applications using the JAX framework on Azure’s NDm A100 v4 Virtual Machines](https://techcommunity.microsoft.com/t5/azure-high-performance-computing/accelerating-ai-applications-using-the-jax-framework-on-azure-s/ba-p/3735314) * [Running T5 training benchmarks using NVIDIA JAX container on Azure - YouTube](https://www.youtube.com/watch?v=oZIV07EwSw4) * GCP * [Getting started with JAX multi-node applications with NVIDIA GPUs on Google Kubernetes Engine](https://cloud.google.com/blog/products/containers-kubernetes/machine-learning-with-jax-on-kubernetes-with-nvidia-gpus) * OCI * [Running a deep learning workload with JAX on multinode multi-GPU clusters on OCI](https://blogs.oracle.com/cloud-infrastructure/post/running-multinode-jax-clusters-on-oci-gpu-cloud) ## **Security Common Vulnerabilities and Exposures (CVEs)** Please review the Security Scanning tab on NGC to view the latest security scan results. For certain open-source vulnerabilities listed in the scan results, NVIDIA provides a response in the form of a Vulnerability Exploitability eXchange (VEX) document. The VEX information can be reviewed and downloaded from the Security Scanning tab. ## **Ethical Considerations** NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal developer team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). ## License By pulling and using the container, you accept the terms and conditions of this [End User License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/) and [Product-Specific Terms](https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/).