Julia

julia

Container

Julia

julia

The Julia programming language is a flexible dynamic language, appropriate for scientific and numerical computing, with performance comparable to traditional statically-typed languages.

The Julia Language

Scientific computing has traditionally required the highest performance, yet domain experts have largely moved to slower dynamic languages for daily work. We believe there are many good reasons to prefer dynamic languages for these applications, and we do not expect their use to diminish. Fortunately, modern language design and compiler techniques make it possible to mostly eliminate the performance trade-off and provide a single environment productive enough for prototyping and efficient enough for deploying performance-intensive applications. The Julia programming language fills this role: it is a flexible dynamic language, appropriate for scientific and numerical computing, with performance comparable to traditional statically-typed languages. The main homepage for Julia can be found at julialang.org.

See here for a document describing prerequisites and setup steps for all HPC containers and instructions for pulling NGC containers.

Julia is a free and open-source MIT licensed

System Requirements

Before running Julia container, please ensure that your system meets the following requirements:

Platform

One of the following container runtimes
- nvidia-docker
- Singularity >= 3.1

GPUs

Pascal(sm60), Volta(sm70), Turing (sm75) NVIDIA GPU(s)
CUDA driver version >= r450, -or- r418, -or- r440

The Julia Language

See here for a document describing prerequisites and setup steps for all HPC containers and instructions for pulling NGC containers.

Julia is a free and open-source MIT licensed

System Requirements

Before running Julia container, please ensure that your system meets the following requirements:

Platform

One of the following container runtimes
- nvidia-docker
- Singularity >= 3.1

GPUs

Pascal(sm60), Volta(sm70), Turing (sm75) NVIDIA GPU(s)
CUDA driver version >= r450, -or- r418, -or- r440

By default, Julia will automatically choose among CUDA Toolkit versions 9.2, 10.0, or 10.1/10.2 based on your installed driver.

System Recommendations

Julia works well with Volta V100 or Pascal P100 GPUs for CUDA packages Julia GPU
Julia supports multi-GPUs in one system. It is best to start with one GPU then scale up to understand what performs best.

Running Julia

Supported Architectures

NGC provides access to Julia containers targeting the following NVIDIA GPU architectures.

Pascal(sm60)
Volta(sm70)

Julia packages:

The Julia package ecosystem contains quite a few GPU-related packages and wrapper libraries, targeting different levels of abstraction. The packages below are precompiled in the container to provide users easy access to Nvidia highly parallel GPUs for accelerated computing.

CUDA

The CUDA.jl package is the main programming interface for working with NVIDIA CUDA GPUs using Julia. It features a user-friendly array abstraction, a compiler for writing CUDA kernels in Julia, and wrappers for various CUDA libraries.

Test scripts

We included example scripts inside the container's /workspace/examples directory for testing the GPU-accelerated CUDA packages when invoking the container and without entering REPL mode.

test.jl : checks all cuda related components
vadd.jl: sums two vercors with random numbers, provide no output
versioninfo.jl: provides info about installed Julia related packages on the screen.

Executables

julia: primary Julia executable

Command invocation

An example command is:

julia /workspace/examples/test.jl

Examples

The following examples demonstrate how to run the NGC Julia container under the supported runtimes.

nvidia-docker
- command line execution
- interactive shell
Singularity
- command line execution
- interactive shell

Running with Nvidia-docker or docker

Command line execution with Nvidia-docker or docker

Setup and invoke Julia container via one of the methods listed below:

Start container with a full-featured interactive command-line REPL(read-eval-print loop) built into the Julia executable. In addition to allowing quick and easy evaluation of Julia's statements, it has a searchable history, tab-completion, many helpful keybindings, and dedicated help and shell modes. The REPL can be started by simply calling Julia with no arguments. In this mode, the user can enter package mode to manage or test other available packages.
Start Julia container with a simple nvidia-docker or docker run command to test a GPU-accelerated CUDA package using built-in example scripts.

This example output is from the CUDA package resolving required packages versions, dependencies, and outputs a summary of multiple tests:

┌ Info: System information:
│ CUDA toolkit 10.2.89, local installation
│ CUDA driver 10.2.0
│ NVIDIA driver 440.33.1
│ 
│ Libraries: 
│ - CUBLAS: 10.2.2
│ - CURAND: 10.1.2
│ - CUFFT: 10.1.2
│ - CUSOLVER: 10.3.0
│ - CUSPARSE: 10.3.1
│ - CUPTI: 12.0.0
│ - NVML: 10.0.0+440.33.1
│ - CUDNN: 7.60.5 (for CUDA 10.2.0)
│ - CUTENSOR: missing
│ 
│ Toolchain:
│ - Julia: 1.5.0
│ - LLVM: 9.0.1
│ - PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
│ - Device support: sm_30, sm_32, sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75
│ 
│ Environment:
│ - JULIA_CUDA_USE_BINARYBUILDER: false
│ 
│ 4 devices:
│   0: Tesla V100-PCIE-16GB (sm_70, 15.770 GiB / 15.782 GiB available)
│   1: Tesla V100-PCIE-16GB (sm_70, 15.770 GiB / 15.782 GiB available)
│   2: Tesla V100-PCIE-16GB (sm_70, 15.770 GiB / 15.782 GiB available)
└   3: Tesla V100-PCIE-16GB (sm_70, 15.770 GiB / 15.782 GiB available)
[ Info: Testing using 1 device(s): 4. Tesla V100-PCIE-16GB (UUID a62c271a-dd48-9769-991a-cd5442ba5110)
[ Info: Skipping the following tests: cutensor

Command line execution using built-in scripts

The following commmand will start the container with all GPUs enabled and run multiple GPU-accelerated tests without entering REPL mode in the Julia container using nvidia-docker or docker:

Additional test scripts information can be found here

Julia's built-in REPL with Nvidia-docker

The following command will launch a full-featured interactive command-line REPL(read-eval-print loop) in the Julia container using nvidia-docker or docker:

$ nvidia-docker run -it --rm --gpus  '' all nvcr.io/hpc/julia:[app_tag]

Where:

-it: start the container with an interactive terminal (short for --interactive --tty)
--rm: make container ephemeral (removes container on exit)
--gpus: the NVIDIA runtime is integrated with the Docker CLI and GPUs can be accessed seamlessly by the container via the Docker CLI options.

The Julia REPL provides different prompt modes:

The REPL has four main modes of operation. The first and most common is the Julia prompt. It is the default mode of operation; each new line initially starts with Julia. It is here that you can enter Julia's expressions. Hitting return or enter after a complete expression has been entered will evaluate the entry and show the result of the last expression:

Prompt mode:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.0 (2020-08-01)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia&gt;

Package mode and testing GPU-accelerated CUDA packages:

Press ] key to enter package mode and then type:

(@v1.5) pkg&gt; test CUDA

Shell mode

Press ; key to enter shell mode and execute NVIDIA System Management Interface command line utility to monitor CUDA, graphic drivers, and GPU devices information by typing:

shell&gt; nvidia-smi
Thu Sep 24 19:33:14 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000000:08:00.0 Off |                  Off |
| N/A   34C    P0    27W / 250W |     12MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  On   | 00000000:09:00.0 Off |                  Off |
| N/A   35C    P0    26W / 250W |     12MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-PCIE...  On   | 00000000:88:00.0 Off |                  Off |
| N/A   32C    P0    25W / 250W |     12MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-PCIE...  On   | 00000000:89:00.0 Off |                  Off |
| N/A   38C    P0    38W / 250W |    421MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Help mode

When the cursor is at the beginning of the line, the prompt can be changed to a help mode by typing ?. Julia will attempt to print help or documentation for anything entered in help mode.

Search modes

In all of the above modes, the executed lines get saved to a history file, which can be searched. To initiate an incremental search through the previous history, type ^R – the control key together with the r key. The prompt will change to (reverse-i-search)`':, and as you type the search query will appear in the quotes. The most recent result that matches the query will dynamically update to the right of the colon as more is typed. To find an older result using the same query, simply type ^R again.

Just as ^R is a reverse search, ^S is a forward search, with the prompt (i-search)`':. The two may be used in conjunction with each other to move through the previous or next matching results, respectively.

For further instructions on how to navigate in REPL mode go to Julia's documentation

Running with Singularity

Pull the image

Save the NGC Julia container as a local Singularity image file:

$ singularity build julia_v1.5.0.simg docker://nvcr.io/hpc/julia:[app_tag]

The Singularity image is now saved in the current working directory as julia_v1.5.0.simg

Note: Singularity/2.x

To pull NGC images with singularity version 2.x and earlier, NGC container registry authentication credentials are required.

To set your NGC container registry authentication credentials:

$ export SINGULARITY_DOCKER_USERNAME='$oauthtoken'
$ export SINGULARITY_DOCKER_PASSWORD=

More information describing how to obtain and use your NVIDIA NGC Cloud Services API key can be found here.

Important

Environment variables

LD_LIBRARY_PATH: (Singularity containers only) Set the environment variable to CUDA's compat library before running container when the host machine has NVIDIA 418.XX graphics driver and CUDA version are 10 or newer. Add the command below as a prefix to the Singularity run command.

LD_LIBRARY_PATH=/usr/local/cuda/compat:$LD_LIBRARY_PATH Singularity run

Bind mounting into Singularity containers

Julia container will attempt to precompile packages into files and save history logs inside the container's /data directory during runtime. Unlike Docker containers that allow root access, Singularity will produce permission denied errors. The workaround is to make a new directory on the host machine and bind mount into the container's /data directory.

mkdir data
singularity run -B $(pwd)/data:/data

Where:

-B: a user-bind path specification

Command line execution using built-in scripts

$ singularity run --nv -B $(pwd)/data:/data julia_v1.5.0.simg /workspace/examples/test_cudadrv.jl Where:

-nv: expose the host GPU(s) to the container
-B: a user-bind path specification

This example script loads the CUDAdrv package then runs multiple tests.

Example of successful Julia output:

    Testing CUDA
Downloading artifact: CompilerSupportLibraries
Downloading artifact: FFTW
Downloading artifact: OpenSpecFun
Downloading artifact: IntelOpenMP
Status `/tmp/jl_EmZmjf/Project.toml`
  [621f4979] AbstractFFTs v0.5.0
  [79e6a3ab] Adapt v2.1.0

┌ Info: System information:
│ CUDA toolkit 10.2.89, local installation
│ CUDA driver 10.2.0
│ NVIDIA driver 440.33.1
│ 
│ Libraries: 
│ - CUBLAS: 10.2.2
│ - CURAND: 10.1.2
│ - CUFFT: 10.1.2
│ - CUSOLVER: 10.3.0
│ - CUSPARSE: 10.3.1
│ - CUPTI: 12.0.0
│ - NVML: 10.0.0+440.33.1
│ - CUDNN: 7.60.5 (for CUDA 10.2.0)
│ - CUTENSOR: missing
│ 
│ Toolchain:
│ - Julia: 1.5.0
│ - LLVM: 9.0.1
│ - PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
│ - Device support: sm_30, sm_32, sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75
│ 
│ Environment:
│ - JULIA_CUDA_USE_BINARYBUILDER: false
│ 
│ 4 devices:
│   0: Tesla V100-PCIE-16GB (sm_70, 15.770 GiB / 15.782 GiB available)
│   1: Tesla V100-PCIE-16GB (sm_70, 15.770 GiB / 15.782 GiB available)
│   2: Tesla V100-PCIE-16GB (sm_70, 15.770 GiB / 15.782 GiB available)
└   3: Tesla V100-PCIE-16GB (sm_70, 15.770 GiB / 15.782 GiB available)
[ Info: Testing using 1 device(s): 4. Tesla V100-PCIE-16GB (UUID a62c271a-dd48-9769-991a-cd5432ba5110)
[ Info: Skipping the following tests: cutensor
                                         |          | ---------------- GPU ---------------- | ---------------- CPU ---------------- |
Test                            (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
initialization                       (2) |     4.15 |   0.00 |  0.0 |       0.00 |      N/A |   0.09 |  2.2 |     211.59 |   585.86 |
apiutils                             (2) |     0.26 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |       5.33 |   585.86 |
array                                (2) |    67.98 |   0.13 |  0.2 |       5.20 |      N/A |   2.42 |  3.6 |    6333.03 |   804.01 |
broadcast                            (2) |    24.09 |   0.00 |  0.0 |       0.00 |      N/A |   0.53 |  2.2 |    1457.37 |   835.50 |
codegen                              (2) |     5.05 |   0.00 |  0.0 |       0.00 |      N/A |   0.12 |  2.5 |     298.64 |   923.12 |
cublas                               (2) |    73.84 |   0.03 |  0.0 |      11.12 |      N/A |   2.40 |  3.3 |    6681.29 |  1388.93 |
cudnn                                (2) |    66.86 |   0.01 |  0.0 |       0.62 |      N/A |   1.64 |  2.5 |    4934.47 |  2555.46 |
cufft                                (2) |    24.10 |   0.02 |  0.1 |     144.16 |      N/A |   0.74 |  3.1 |    1977.75 |  2709.52 |
curand                               (2) |     0.09 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |       5.48 |  2709.52 |
cusolver                             (2) |    53.75 |   0.07 |  0.1 |    1128.68 |      N/A |   1.78 |  3.3 |    4513.01 |  2724.78 |
cusparse                             (2) |    31.80 |   0.01 |  0.0 |       4.46 |      N/A |   0.79 |  2.5 |    2075.20 |  2724.78 |
examples                             (2) |   144.55 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      24.31 |  2724.78 |
exceptions                           (2) |    87.95 |   0.00 |  0.0 |       0.00 |      N/A |   0.03 |  0.0 |      24.27 |  2724.78 |
execution                            (2) |    34.11 |   0.00 |  0.0 |       0.15 |      N/A |   0.86 |  2.5 |    2350.04 |  2724.78 |
forwarddiff                          (2) |    98.76 |   0.00 |  0.0 |       0.00 |      N/A |   0.99 |  1.0 |    2597.81 |  2724.78 |
iterator                             (2) |     2.39 |   0.00 |  0.0 |       1.07 |      N/A |   0.08 |  3.3 |     202.23 |  2724.78 |
nnlib                                (2) |     2.63 |   0.00 |  0.0 |       0.00 |      N/A |   0.08 |  3.1 |     136.51 |  2724.78 |
nvml                                 (2) |     0.58 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      49.31 |  2724.78 |
nvtx                                 (2) |     1.38 |   0.00 |  0.0 |       0.00 |      N/A |   0.04 |  3.0 |     106.06 |  2724.78 |
pointer                              (2) |     0.20 |   0.00 |  0.0 |       0.00 |      N/A |   0.04 | 19.9 |       5.85 |  2724.78 |
pool                                 (2) |     3.48 |   0.00 |  0.0 |       0.00 |      N/A |   0.71 | 20.3 |     201.80 |  2724.78 |
random                               (2) |     8.15 |   0.00 |  0.0 |       0.02 |      N/A |   0.21 |  2.6 |     583.35 |  2724.78 |
statistics                           (2) |    12.72 |   0.00 |  0.0 |       0.00 |      N/A |   0.43 |  3.4 |     979.70 |  2724.78 |
texture                              (2) |    23.14 |   0.00 |  0.0 |       0.08 |      N/A |   1.01 |  4.4 |    2209.33 |  2724.78 |
threading                            (2) |     2.75 |   0.00 |  0.2 |      10.94 |      N/A |   0.05 |  1.7 |     184.11 |  2724.78 |
utils                                (2) |     1.11 |   0.00 |  0.0 |       0.00 |      N/A |   0.05 |  4.6 |     114.40 |  2724.78 |
cudadrv/context                      (2) |     1.01 |   0.00 |  0.0 |       0.00 |      N/A |   0.04 |  4.2 |      62.13 |  2724.78 |
cudadrv/devices                      (2) |     0.32 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      32.87 |  2724.78 |
cudadrv/errors                       (2) |     0.23 |   0.00 |  0.0 |       0.00 |      N/A |   0.04 | 18.3 |      28.18 |  2724.78 |
cudadrv/events                       (2) |     0.21 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      30.91 |  2724.78 |
cudadrv/execution                    (2) |     0.90 |   0.00 |  0.1 |       0.00 |      N/A |   0.04 |  4.6 |      78.40 |  2724.78 |
cudadrv/memory                       (2) |     1.84 |   0.00 |  0.0 |       0.00 |      N/A |   0.09 |  4.7 |     171.82 |  2724.78 |
cudadrv/module                       (2) |     0.57 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      30.07 |  2724.78 |
cudadrv/occupancy                    (2) |     0.13 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      12.21 |  2724.78 |
cudadrv/profile                      (2) |     0.45 |   0.00 |  0.0 |       0.00 |      N/A |   0.04 |  9.1 |      61.04 |  2724.78 |
cudadrv/stream                       (2) |     0.28 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |      41.41 |  2724.78 |
cudadrv/version                      (2) |     0.01 |   0.00 |  0.0 |       0.00 |      N/A |   0.00 |  0.0 |       0.07 |  2724.78 |
cusolver/cusparse                    (2) |     7.43 |   0.00 |  0.0 |       0.19 |      N/A |   0.16 |  2.1 |     373.26 |  2724.78 |
device/array                         (2) |     1.95 |   0.00 |  0.0 |       0.00 |      N/A |   0.04 |  2.2 |     103.34 |  2724.78 |

Additional test scripts information can be found here

Interactive shell

The following command will launch an interactive shell in the Julia container using singularity shell:

$ singularity shell --nv -B $(pwd)/data:/data julia_v1.5.0.simg

Where:

--nv: expose the host GPU(s) to the container
-B: a user-bind path specification

This should produce a Singularity shell prompt within the container:

Singularity: Invoking an interactive shell within container...

Singularity julia_v1.5.0.simg:~&gt;

Inside the container, you may start Julia in REPL mode by typing:

Singularity julia_1.5.0.simg:~&gt; julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.0 (2020-08-01)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |
julia&gt;

The Julia Language

System Requirements

Platform

GPUs

The Julia Language

System Requirements

Platform

GPUs

System Recommendations

Running Julia

Supported Architectures

Julia packages:

CUDA

Test scripts

Executables

Command invocation

Examples

Running with Nvidia-docker or docker

Command line execution with Nvidia-docker or docker

Command line execution using built-in scripts

Julia's built-in REPL with Nvidia-docker

The Julia REPL provides different prompt modes:

Prompt mode:

Package mode and testing GPU-accelerated CUDA packages:

Shell mode

Help mode

Search modes

Running with Singularity

Pull the image

Note: Singularity/2.x

Important

Environment variables

Bind mounting into Singularity containers

Command line execution using built-in scripts

Interactive shell

Suggested Reading