Linux / amd64
Ardennes is a random forest model trained by NVIDIA on snowflake-arctic-embed-m-long embeddings to detect attempts to jailbreak large language models. At the time of release, it is the best known publicly available model for detecting LLM jailbreak attempts.
Additional details about the model, including comparisons to other public models are available in the accompanying paper, accepted to the 2025 AAAI workshop on AI for Cyber Security (AICS).
One-time access setup, as needed:
export NGC_API_KEY=<YOUR NGC API KEY>
docker login nvcr.io # ensure you login with the right key. Sometimes, if you've used another key in the past, this will just succeed without asking you for your new key. In this case, delete the config file it caches creds in, and try again.
# Username: $oauthtoken
# Password: <NGC_API_KEY>
We provide the Ardennes model as an NVIDIA NIM, so you can simply pull the image from docker and run it.
#!/bin/bash
export NGC_API_KEY=<your NGC personal key with access to the "nvstaging/nim" org/team>
export NIM_IMAGE='nvcr.io/nvstaging/nim/ardennes-jailbreak-arctic-nim:v0.1'
export MODEL_NAME='ardennes-jailbreak-arctic'
docker pull $NIM_IMAGE
And go!
docker run -it --name=$MODEL_NAME \
--gpus=all --runtime=nvidia \
-e NGC_API_KEY="$NGC_API_KEY" \
--expose 8000 \
$NIM_IMAGE
The running NIM container exposes a standard REST API and you can send POST requests to the v1/classify
endpoint as JSON to get model responses.
$ curl --data '{"input": "hello this is a test"}' --header "Content-Type: application/json" --header "Accept: application/json" http://0.0.0.0:8000/v1/classify
This will return a JSON dictionary with the model’s prediction of whether or not the provided input is a jailbreaking attempt.
{"jailbreak": false, "score": -0.9921652427737031}
You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.