Folding@home is a distributed computing project for simulating protein dynamics, including the process of protein folding and the movements of proteins implicated in a variety of diseases. It brings together citizen scientists who volunteer to run simulations of protein dynamics on their computers. Insights from this data are helping scientists to better understand biology, and providing new opportunities for developing therapeutics.
Running the Folding@home container is straightforward, however special care must be taken to manage and return Work Units on time.
Familiarity with Linux and containers is assumed. Due to the prerequisites and setup complexity this does not make an ideal "hello-world" container - the standard Folding@home Linux clients work great and have slightly less overhead.
The Folding@home container is similar to a database container needing persistent storage mounted into /fah and careful lifecycle management to avoid losing or wasting work. The config.xml also contains client state, so must be managed that way.
CUDA 9.2 is used as a base for greater compatibility - for the details, see: CUDA Compatibility
This document uses Docker as the example runtime but others are also supported. Read the Other Runtimes section for Singularity and other runtimes.
Each of these is explained in more detail below, but they are included here for clarity. RFC 2119 meanings.
/fah
of the running
container. Running containers MUST NOT share the same mounted directory,
but directories SHOULD be reused to avoid lost Work Units.config.xml
in each persistent
storage directory before running the container for the first time.--user
or
equivalent, so that the running container has read-write permissions to
the persistent storage mounted in /fah
.Read the README and CONTRIBUTING at https://github.com/foldingathome/containers/ for design goals, architecture, guidelines for contributing, and other information.
Please raise any bugs or issues with the containers on GitHub: https://github.com/foldingathome/containers/issues
These values will be used in your config.xml later.
Before scaling up containers on a cluster or cloud, it's important to be
familiar with the /fah
storage requirements, life cycle, and
usage that will complete Work Units and help the research on Folding@home.
That starts with one machine.
Once the prerequisites are met, it's time to run the container.
See example config files and be sure to set your user/passkey/team.
# Make a directory for persistent storage
mkdir $HOME/fah
# Edit config.xml based on an example config below, use vi or other editor.
vi $HOME/fah/config.xml
Over time config.xml will also have client state, and will be rewritten by the client.
# Run container with GPUs, name it "fah0", map user and /fah volume
docker run --gpus all --name fah0 -d --user "$(id -u):$(id -g)" \
--volume $HOME/fah:/fah fah-gpu:VERSION
# Dump output so far this run
docker logs fah0
# Tail the log
docker logs -f fah0
# Stop container once Work Units finish (preferred), may take hours
docker exec fah0 FAHClient --send-command finish
# Stop container after checkpoint, usually under 30 seconds.
# Be sure to start it again to let it finish before the Work Units expire.
docker exec fah0 FAHClient --send-command shutdown
# The container can also just be killed, but that's not as nice.
There are a lot of container orchestrators, so the requirements are as simple as possible:
/fah
directory of the container. They should
be reused, but two containers should never be using the same directory.Create a root folder on the cluster storage, e.g. .../root-dir/
and create
subdirectories based on one of these methods:
Method 1: For smaller clusters, having one directory per host is simple. When
run the containers can mount .../root-dir/$hostname/
to /fah
for the job
running on hostname
.
Method 2: For larger clusters, having a pool of directories that can be reused
based on how many clients are run. Running them takes more careful management
but mounting .../root-dir/$jobname/
to the /fah
folder of jobs named
fold00
... fold99
is the general idea.
Before running any clients make sure to copy your customized config.xml
to all the subfolders.
Other methods are valid, as long as they meet the requirements above.
Based on the storage setup, run one container per subfolder, mounting it
into /fah
.
Your container orchestrator should have commands equivalent to
docker logs ...
and docker exec ...
to perform the same functions.
# See how many Work Untis have been returned by all clients
grep points .../root-dir/*/log.txt .../root-dir/*/logs/*.txt
How containers are stopped on the cluster will effect how many Work Units are late or lost.
# prefered shutdown
command exec container-id FAHClient --send-command finish
# Stop container after checkpoint, usually under 30 seconds.
command exec container-id FAHClient --send-command shutdown
# The container can also just be killed, but that's not as nice.
The goal is to avoid accumulating a lot of subdirectories with unfinished Work Units.
Running the Folding@home container with low priority on a cluster where it
gets preempted and resumed will work fine. The max-units
configuration
option may also be useful in combination with low priority to use idle
capacity where preemption is not available.
For the latest example config files see: https://github.com/FoldingAtHome/containers/tree/master/fah-gpu#example-config-files
The config options used for running the client in containers are slightly different than the ones used in a standalone install. These are the interesting ones:
Client help on all the options is available with:
docker run --rm fah-gpu:VERSION --help
While this README focused on Docker, it is not the only container runtime.
A full Singularity HOWTO is beyond this document currently. These commands should help someone familiar with Singularity get started on a single machine:
mkdir fah && cd fah
# Create/Copy config.xml as described above
singularity build fah.sif docker://nvcr.io/hpc/foldingathome/fah-gpu:7.6.13
singularity instance start --nv -B$(pwd):/fah fah.sif fah_instance
singularity exec instance://fah_instance /bin/bash -c "coproc /usr/bin/FAHClient"
tail -f log.txt