The Julia programming language is a flexible dynamic language, appropriate for scientific and numerical computing, with performance comparable to traditional statically-typed languages.
The Julia Language
Scientific computing has traditionally required the highest performance, yet domain experts have largely moved to slower dynamic languages for daily work. We believe there are many good reasons to prefer dynamic languages for these applications, and we do not expect their use to diminish. Fortunately, modern language design and compiler techniques make it possible to mostly eliminate the performance trade-off and provide a single environment productive enough for prototyping and efficient enough for deploying performance-intensive applications. The Julia programming language fills this role: it is a flexible dynamic language, appropriate for scientific and numerical computing, with performance comparable to traditional statically-typed languages. The main homepage for Julia can be found at julialang.org.
See here for a document describing prerequisites and setup steps for all HPC containers and instructions for pulling NGC containers.
Julia is a free and open-source MIT licensed
System Requirements
Before running Julia container, please ensure that your system meets the following requirements:
Platform
- One of the following container runtimes
- nvidia-docker
- Singularity >= 3.1
GPUs
- Pascal(sm60), Volta(sm70), Turing (sm75) NVIDIA GPU(s)
- CUDA driver version >= r450, -or- r418, -or- r440
The Julia Language
Scientific computing has traditionally required the highest performance, yet domain experts have largely moved to slower dynamic languages for daily work. We believe there are many good reasons to prefer dynamic languages for these applications, and we do not expect their use to diminish. Fortunately, modern language design and compiler techniques make it possible to mostly eliminate the performance trade-off and provide a single environment productive enough for prototyping and efficient enough for deploying performance-intensive applications. The Julia programming language fills this role: it is a flexible dynamic language, appropriate for scientific and numerical computing, with performance comparable to traditional statically-typed languages. The main homepage for Julia can be found at julialang.org.
See here for a document describing prerequisites and setup steps for all HPC containers and instructions for pulling NGC containers.
Julia is a free and open-source MIT licensed
System Requirements
Before running Julia container, please ensure that your system meets the following requirements:
Platform
- One of the following container runtimes
- nvidia-docker
- Singularity >= 3.1
GPUs
- Pascal(sm60), Volta(sm70), Turing (sm75) NVIDIA GPU(s)
- CUDA driver version >= r450, -or- r418, -or- r440
By default, Julia will automatically choose among CUDA Toolkit versions 9.2, 10.0, or 10.1/10.2 based on your installed driver.
System Recommendations
- Julia works well with Volta V100 or Pascal P100 GPUs for CUDA packages Julia GPU
- Julia supports multi-GPUs in one system. It is best to start with one GPU then scale up to understand what performs best.
Running Julia
Supported Architectures
NGC provides access to Julia containers targeting the following NVIDIA GPU architectures.
- Pascal(sm60)
- Volta(sm70)
Julia packages:
The Julia package ecosystem contains quite a few GPU-related packages and wrapper libraries, targeting different levels of abstraction. The packages below are precompiled in the container to provide users easy access to Nvidia highly parallel GPUs for accelerated computing.
CUDA
The CUDA.jl package is the main programming interface for working with NVIDIA CUDA GPUs using Julia. It features a user-friendly array abstraction, a compiler for writing CUDA kernels in Julia, and wrappers for various CUDA libraries.
Test scripts
We included example scripts inside the container's /workspace/examples directory for testing the GPU-accelerated CUDA packages when invoking the container and without entering REPL mode.
test.jl: checks all cuda related componentsvadd.jl: sums two vercors with random numbers, provide no outputversioninfo.jl: provides info about installed Julia related packages on the screen.
Executables
julia: primary Julia executable
Command invocation
An example command is:
julia /workspace/examples/test.jl
Examples
The following examples demonstrate how to run the NGC Julia container under the supported runtimes.
Running with Nvidia-docker or docker
Command line execution with Nvidia-docker or docker
Setup and invoke Julia container via one of the methods listed below:
- Start container with a full-featured interactive command-line REPL(read-eval-print loop) built into the Julia executable. In addition to allowing quick and easy evaluation of Julia's statements, it has a searchable history, tab-completion, many helpful keybindings, and dedicated help and shell modes. The REPL can be started by simply calling Julia with no arguments. In this mode, the user can enter package mode to manage or test other available packages.
- Start Julia container with a simple
nvidia-dockerordockerrun command to test a GPU-accelerated CUDA package using built-in example scripts.
This example output is from the CUDA package resolving required packages versions, dependencies, and outputs a summary of multiple tests:
┌ Info: System information:
│ CUDA toolkit 10.2.89, local installation
│ CUDA driver 10.2.0
│ NVIDIA driver 440.33.1
│
│ Libraries:
│ - CUBLAS: 10.2.2
│ - CURAND: 10.1.2
│ - CUFFT: 10.1.2
│ - CUSOLVER: 10.3.0
│ - CUSPARSE: 10.3.1
│ - CUPTI: 12.0.0
│ - NVML: 10.0.0+440.33.1
│ - CUDNN: 7.60.5 (for CUDA 10.2.0)
│ - CUTENSOR: missing
│
│ Toolchain:
│ - Julia: 1.5.0
│ - LLVM: 9.0.1
│ - PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
│ - Device support: sm_30, sm_32, sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75
│
│ Environment:
│ - JULIA_CUDA_USE_BINARYBUILDER: false
│
│ 4 devices:
│ 0: Tesla V100-PCIE-16GB (sm_70, 15.770 GiB / 15.782 GiB available)
│ 1: Tesla V100-PCIE-16GB (sm_70, 15.770 GiB / 15.782 GiB available)
│ 2: Tesla V100-PCIE-16GB (sm_70, 15.770 GiB / 15.782 GiB available)
└ 3: Tesla V100-PCIE-16GB (sm_70, 15.770 GiB / 15.782 GiB available)
[ Info: Testing using 1 device(s): 4. Tesla V100-PCIE-16GB (UUID a62c271a-dd48-9769-991a-cd5442ba5110)
[ Info: Skipping the following tests: cutensor
Command line execution using built-in scripts
The following commmand will start the container with all GPUs enabled and run multiple GPU-accelerated tests without entering REPL mode in the Julia container using nvidia-docker or docker:
Additional test scripts information can be found here
Julia's built-in REPL with Nvidia-docker
The following command will launch a full-featured interactive command-line REPL(read-eval-print loop) in the Julia container using nvidia-docker or docker:
$ nvidia-docker run -it --rm --gpus '' all nvcr.io/hpc/julia:[app_tag]
Where:
-it: start the container with an interactive terminal (short for --interactive --tty)--rm: make container ephemeral (removes container on exit)--gpus: the NVIDIA runtime is integrated with the Docker CLI and GPUs can be accessed seamlessly by the container via the Docker CLI options.
The Julia REPL provides different prompt modes:
The REPL has four main modes of operation. The first and most common is the Julia prompt. It is the default mode of operation; each new line initially starts with Julia. It is here that you can enter Julia's expressions. Hitting return or enter after a complete expression has been entered will evaluate the entry and show the result of the last expression:
Prompt mode:
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.5.0 (2020-08-01)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia>
Package mode and testing GPU-accelerated CUDA packages:
Press ] key to enter package mode and then type:
(@v1.5) pkg> test CUDA
Shell mode
Press ; key to enter shell mode and execute NVIDIA System Management Interface command line
utility to monitor CUDA, graphic drivers, and GPU devices information by typing:
shell> nvidia-smi
Thu Sep 24 19:33:14 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... On | 00000000:08:00.0 Off | Off |
| N/A 34C P0 27W / 250W | 12MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... On | 00000000:09:00.0 Off | Off |
| N/A 35C P0 26W / 250W | 12MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-PCIE... On | 00000000:88:00.0 Off | Off |
| N/A 32C P0 25W / 250W | 12MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-PCIE... On | 00000000:89:00.0 Off | Off |
| N/A 38C P0 38W / 250W | 421MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Help mode
When the cursor is at the beginning of the line, the prompt can be changed to a help mode by typing ?. Julia will attempt to print help or documentation for anything entered in help mode.
Search modes
In all of the above modes, the executed lines get saved to a history file, which can be searched. To initiate an incremental search through the previous history, type ^R – the control key together with the r key. The prompt will change to (reverse-i-search)`':, and as you type the search query will appear in the quotes. The most recent result that matches the query will dynamically update to the right of the colon as more is typed. To find an older result using the same query, simply type ^R again.
Just as ^R is a reverse search, ^S is a forward search, with the prompt (i-search)`':. The two may be used in conjunction with each other to move through the previous or next matching results, respectively.
For further instructions on how to navigate in REPL mode go to Julia's documentation
Running with Singularity
Pull the image
Save the NGC Julia container as a local Singularity image file:
$ singularity build julia_v1.5.0.simg docker://nvcr.io/hpc/julia:[app_tag]
The Singularity image is now saved in the current working directory as julia_v1.5.0.simg
Note: Singularity/2.x
To pull NGC images with singularity version 2.x and earlier, NGC container registry authentication credentials are required.
To set your NGC container registry authentication credentials:
$ export SINGULARITY_DOCKER_USERNAME='$oauthtoken'
$ export SINGULARITY_DOCKER_PASSWORD=
More information describing how to obtain and use your NVIDIA NGC Cloud Services API key can be found here.
Important
Environment variables
LD_LIBRARY_PATH: (Singularity containers only) Set the environment variable to CUDA's compat library before running container when the host machine has NVIDIA 418.XX graphics driver and CUDA version are 10 or newer. Add the command below as a prefix to the Singularity run command.
LD_LIBRARY_PATH=/usr/local/cuda/compat:$LD_LIBRARY_PATH Singularity run
Bind mounting into Singularity containers
Julia container will attempt to precompile packages into files and save history logs inside the container's /data directory during runtime. Unlike Docker containers that allow root access, Singularity will produce permission denied errors. The workaround is to make a new directory on the host machine and bind mount into the container's /data directory.
mkdir data
singularity run -B $(pwd)/data:/data
Where:
-B: a user-bind path specification
Command line execution using built-in scripts
$ singularity run --nv -B $(pwd)/data:/data julia_v1.5.0.simg /workspace/examples/test_cudadrv.jl
Where:
-nv: expose the host GPU(s) to the container-B: a user-bind path specification
This example script loads the CUDAdrv package then runs multiple tests.
Example of successful Julia output:
Testing CUDA
Downloading artifact: CompilerSupportLibraries
Downloading artifact: FFTW
Downloading artifact: OpenSpecFun
Downloading artifact: IntelOpenMP
Status `/tmp/jl_EmZmjf/Project.toml`
[621f4979] AbstractFFTs v0.5.0
[79e6a3ab] Adapt v2.1.0
┌ Info: System information:
│ CUDA toolkit 10.2.89, local installation
│ CUDA driver 10.2.0
│ NVIDIA driver 440.33.1
│
│ Libraries:
│ - CUBLAS: 10.2.2
│ - CURAND: 10.1.2
│ - CUFFT: 10.1.2
│ - CUSOLVER: 10.3.0
│ - CUSPARSE: 10.3.1
│ - CUPTI: 12.0.0
│ - NVML: 10.0.0+440.33.1
│ - CUDNN: 7.60.5 (for CUDA 10.2.0)
│ - CUTENSOR: missing
│
│ Toolchain:
│ - Julia: 1.5.0
│ - LLVM: 9.0.1
│ - PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
│ - Device support: sm_30, sm_32, sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75
│
│ Environment:
│ - JULIA_CUDA_USE_BINARYBUILDER: false
│
│ 4 devices:
│ 0: Tesla V100-PCIE-16GB (sm_70, 15.770 GiB / 15.782 GiB available)
│ 1: Tesla V100-PCIE-16GB (sm_70, 15.770 GiB / 15.782 GiB available)
│ 2: Tesla V100-PCIE-16GB (sm_70, 15.770 GiB / 15.782 GiB available)
└ 3: Tesla V100-PCIE-16GB (sm_70, 15.770 GiB / 15.782 GiB available)
[ Info: Testing using 1 device(s): 4. Tesla V100-PCIE-16GB (UUID a62c271a-dd48-9769-991a-cd5432ba5110)
[ Info: Skipping the following tests: cutensor
| | ---------------- GPU ---------------- | ---------------- CPU ---------------- |
Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
initialization (2) | 4.15 | 0.00 | 0.0 | 0.00 | N/A | 0.09 | 2.2 | 211.59 | 585.86 |
apiutils (2) | 0.26 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 5.33 | 585.86 |
array (2) | 67.98 | 0.13 | 0.2 | 5.20 | N/A | 2.42 | 3.6 | 6333.03 | 804.01 |
broadcast (2) | 24.09 | 0.00 | 0.0 | 0.00 | N/A | 0.53 | 2.2 | 1457.37 | 835.50 |
codegen (2) | 5.05 | 0.00 | 0.0 | 0.00 | N/A | 0.12 | 2.5 | 298.64 | 923.12 |
cublas (2) | 73.84 | 0.03 | 0.0 | 11.12 | N/A | 2.40 | 3.3 | 6681.29 | 1388.93 |
cudnn (2) | 66.86 | 0.01 | 0.0 | 0.62 | N/A | 1.64 | 2.5 | 4934.47 | 2555.46 |
cufft (2) | 24.10 | 0.02 | 0.1 | 144.16 | N/A | 0.74 | 3.1 | 1977.75 | 2709.52 |
curand (2) | 0.09 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 5.48 | 2709.52 |
cusolver (2) | 53.75 | 0.07 | 0.1 | 1128.68 | N/A | 1.78 | 3.3 | 4513.01 | 2724.78 |
cusparse (2) | 31.80 | 0.01 | 0.0 | 4.46 | N/A | 0.79 | 2.5 | 2075.20 | 2724.78 |
examples (2) | 144.55 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 24.31 | 2724.78 |
exceptions (2) | 87.95 | 0.00 | 0.0 | 0.00 | N/A | 0.03 | 0.0 | 24.27 | 2724.78 |
execution (2) | 34.11 | 0.00 | 0.0 | 0.15 | N/A | 0.86 | 2.5 | 2350.04 | 2724.78 |
forwarddiff (2) | 98.76 | 0.00 | 0.0 | 0.00 | N/A | 0.99 | 1.0 | 2597.81 | 2724.78 |
iterator (2) | 2.39 | 0.00 | 0.0 | 1.07 | N/A | 0.08 | 3.3 | 202.23 | 2724.78 |
nnlib (2) | 2.63 | 0.00 | 0.0 | 0.00 | N/A | 0.08 | 3.1 | 136.51 | 2724.78 |
nvml (2) | 0.58 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 49.31 | 2724.78 |
nvtx (2) | 1.38 | 0.00 | 0.0 | 0.00 | N/A | 0.04 | 3.0 | 106.06 | 2724.78 |
pointer (2) | 0.20 | 0.00 | 0.0 | 0.00 | N/A | 0.04 | 19.9 | 5.85 | 2724.78 |
pool (2) | 3.48 | 0.00 | 0.0 | 0.00 | N/A | 0.71 | 20.3 | 201.80 | 2724.78 |
random (2) | 8.15 | 0.00 | 0.0 | 0.02 | N/A | 0.21 | 2.6 | 583.35 | 2724.78 |
statistics (2) | 12.72 | 0.00 | 0.0 | 0.00 | N/A | 0.43 | 3.4 | 979.70 | 2724.78 |
texture (2) | 23.14 | 0.00 | 0.0 | 0.08 | N/A | 1.01 | 4.4 | 2209.33 | 2724.78 |
threading (2) | 2.75 | 0.00 | 0.2 | 10.94 | N/A | 0.05 | 1.7 | 184.11 | 2724.78 |
utils (2) | 1.11 | 0.00 | 0.0 | 0.00 | N/A | 0.05 | 4.6 | 114.40 | 2724.78 |
cudadrv/context (2) | 1.01 | 0.00 | 0.0 | 0.00 | N/A | 0.04 | 4.2 | 62.13 | 2724.78 |
cudadrv/devices (2) | 0.32 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 32.87 | 2724.78 |
cudadrv/errors (2) | 0.23 | 0.00 | 0.0 | 0.00 | N/A | 0.04 | 18.3 | 28.18 | 2724.78 |
cudadrv/events (2) | 0.21 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 30.91 | 2724.78 |
cudadrv/execution (2) | 0.90 | 0.00 | 0.1 | 0.00 | N/A | 0.04 | 4.6 | 78.40 | 2724.78 |
cudadrv/memory (2) | 1.84 | 0.00 | 0.0 | 0.00 | N/A | 0.09 | 4.7 | 171.82 | 2724.78 |
cudadrv/module (2) | 0.57 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 30.07 | 2724.78 |
cudadrv/occupancy (2) | 0.13 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 12.21 | 2724.78 |
cudadrv/profile (2) | 0.45 | 0.00 | 0.0 | 0.00 | N/A | 0.04 | 9.1 | 61.04 | 2724.78 |
cudadrv/stream (2) | 0.28 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 41.41 | 2724.78 |
cudadrv/version (2) | 0.01 | 0.00 | 0.0 | 0.00 | N/A | 0.00 | 0.0 | 0.07 | 2724.78 |
cusolver/cusparse (2) | 7.43 | 0.00 | 0.0 | 0.19 | N/A | 0.16 | 2.1 | 373.26 | 2724.78 |
device/array (2) | 1.95 | 0.00 | 0.0 | 0.00 | N/A | 0.04 | 2.2 | 103.34 | 2724.78 |
Additional test scripts information can be found here
Interactive shell
The following command will launch an interactive shell in the Julia container using singularity shell:
$ singularity shell --nv -B $(pwd)/data:/data julia_v1.5.0.simg
Where:
--nv: expose the host GPU(s) to the container-B: a user-bind path specification
This should produce a Singularity shell prompt within the container:
Singularity: Invoking an interactive shell within container...
Singularity julia_v1.5.0.simg:~>
Inside the container, you may start Julia in REPL mode by typing:
Singularity julia_1.5.0.simg:~> julia
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.5.0 (2020-08-01)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia>