The US Department of Energy (DOE) and the National Institutes of Health (NIH) have partnered to accelerate cancer research. The CANDLE (CANcer Distributed Learning Environment) project focuses specifically on the machine learning aspect of this challenging research by building a single scalable deep neural network code to facilitate this work. The CANDLE software is comprised of various third party packages that that allow deep learning workflows to be deployed at the large scale of supercomputers. More info on CANDLE can be found at http://candle.cels.anl.gov/. See here for a document describing prerequisites and setup steps for all HPC containers. See here for a document describing the steps to pull NGC containers.
1 Running CANDLE
In this example, we are running a few of the test programs from the CANDLE Benchmark suite located on GitHub. The CANDLE container comes prepackaged with a simple test script that downloads and runs a few of the test programs from the CANDLE Benchmark suite located on GitHub.
First, start up the container interactively.
nvidia-docker run --rm -it nvcr.io/hpc/candle:20180326 /bin/bash
After the container starts you will be in the /workspace directory.
To complete CANDLE setup, execute source /opt/candle_setup.sh
Next, copy the script located in ‘/opt/testscript.sh’ to some other directory inside the container and execute it.
This script clones the CANDLE Benchmark and Tutorials suites and downloads the appropriate data to run a CANDLE hyperparameter optimization sweep. To learn more about these benchmarks please check out the CANDLE GitHub page https://github.com/ECP-CANDLE.
2 Implementation Details
The CANDLE software is designed to run deep learning workflows in parallel with MPI and as such a few additional third-party software tools are required. If you wish to run CANDLE software and define your own workflow you’ll need your scripts to point to the installed location of these tools. EQ-R https://github.com/emews/EQ-R is installed at /opt/EQ-R Swift/T http://swift-lang.org/Swift-T/ is installed at /opt/swift-t Additionally a modification was made to the Turbine launch script (/opt/swift-t/turbine/bin/turbine) to facilitate simple GPU scheduling. This modification assigns a single GPU to each MPI process that is launched. If you wish to modify this behavior you can change this in the Turbine launch script or the associated GPU runscript (/opt/swift-t/turbine/bin/gpurunscript.sh).