NGC Catalog
CLASSIC
Welcome Guest
Containers
NemoGuard JailbreakDetect

NemoGuard JailbreakDetect

For copy image paths and more information, please view on a desktop device.
Associated Products
Features
Description
Container for classifying jailbreak attempts using NemoGuard JailbreakDetect
Publisher
NVIDIA
Latest Tag
1.10.1
Modified
August 28, 2025
Compressed Size
9.88 GB
Multinode Support
No
Multi-Arch Support
Yes
1.10.1 (Latest) Security Scan Results

Linux / amd64

Sorry, your browser does not support inline SVG.

Linux / arm64

Sorry, your browser does not support inline SVG.

NemoGuard JailbreakDetect

Ardennes is a random forest model trained by NVIDIA on snowflake-arctic-embed-m-long embeddings to detect attempts to jailbreak large language models. At the time of release, it is the best known publicly available model for detecting LLM jailbreak attempts.

Additional details about the model, including comparisons to other public models are available in the accompanying paper, accepted to the 2025 AAAI workshop on AI for Cyber Security (AICS).

Ardennes NIM Usage

Setup

One-time access setup, as needed:

export NGC_API_KEY=<YOUR NGC API KEY>
docker login nvcr.io  # ensure you login with the right key. Sometimes, if you've used another key in the past, this will just succeed without asking you for your new key. In this case, delete the config file it caches creds in, and try again.

# Username: $oauthtoken
# Password: <NGC_API_KEY>

Serving the model as a NIM

We provide the Ardennes model as an NVIDIA NIM, so you can simply pull the image from docker and run it.

#!/bin/bash

export NGC_API_KEY=<your NGC personal key with access to the "nvstaging/nim" org/team>
export NIM_IMAGE='nvcr.io/nvstaging/nim/ardennes-jailbreak-arctic-nim:v0.1'
export MODEL_NAME='ardennes-jailbreak-arctic'
docker pull $NIM_IMAGE

And go!

docker run -it --name=$MODEL_NAME \
    --gpus=all --runtime=nvidia \
    -e NGC_API_KEY="$NGC_API_KEY" \
    --expose 8000 \
    $NIM_IMAGE

Running inference with the NIM

The running NIM container exposes a standard REST API and you can send POST requests to the v1/classify endpoint as JSON to get model responses.

$ curl --data '{"input": "hello this is a test"}' --header "Content-Type: application/json" --header "Accept: application/json" http://0.0.0.0:8000/v1/classify

This will return a JSON dictionary with the model’s prediction of whether or not the provided input is a jailbreaking attempt.

{"jailbreak": false, "score": -0.9921652427737031}

NemoGuard JailbreakDetect Overview

Description

NemoGuard JailbreakDetect is a random forest model trained by NVIDIA on snowflake-arctic-embed-m-long embeddings to detect attempts to jailbreak large language models. At the time of release, it is the best known publicly available model for detecting LLM jailbreak attempts.

This container includes two models: the NemoGuard JailbreakDetect random forest classifier, and the snowflake-arctic-embed-m-long embedding model.

The container components are ready for commercial/non-commercial use.

Third-Party Community Consideration

This container includes a model that is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see snowflake-arctic-embed-m-long model card.

License/Terms of Use:

GOVERNING TERMS: Use of the NIM container is governed by the NVIDIA Software License Agreement and the Product-Specific Terms for NVIDIA AI Products; use of this model is governed by the NVIDIA Community Model License.

You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.

Deployment Geography:

Global

Release Date:

Build.Nvidia.com [Insert 08/14/2025] via [https://build.nvidia.com/nvidia/nemoguard-jailbreak-detect]

Hugging Face [01/15/2025] via [https://huggingface.co/nvidia/NemoGuard-JailbreakDetect]

NGC [Insert 08/14/2025] via [https://catalog.ngc.nvidia.com/orgs/nim/teams/nvidia/containers/nemoguard-jailbreak-detect]

Program Classes:

For example: The NemoGuard JailbreakDetect Container includes the following models:

Model Name & Link Use Case How to Pull the Model
NemoGuard JailbreakDetect random forest classifier Intended to be deployed as a guardrail in an LLM system, to scan user-provided prompts for jailbreaking attempts prior to sending those prompts to an LLM. Automatic
snowflake-arctic-embed-m-long embedding model Provides input embeddings of user prompts which are then used by the NemoGuard JailbreakDetect random forest classifier above. Automatic

Enterprise Support

Get access to knowledge base articles and support cases or submit a ticket.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Deployment Details:

Setup

One-time access setup, as needed:

export NGC_API_KEY=<YOUR NGC API KEY>  
docker login nvcr.io  # ensure you login with the right key. Sometimes, if you've used another key in the past, this will just succeed without asking you for your new key. In this case, delete the config file it caches creds in, and try again.

# Username: $oauthtoken  
# Password: <NGC_API_KEY>  

Serving the model as a NIM

We provide the JailbreakDetect model as an NVIDIA NIM, so you can simply pull the image from docker and run it.

#!/bin/bash

export NGC_API_KEY=<your NGC personal key with access to the "nvstaging/nim" org/team>  
export NIM_IMAGE='nvcr.io/nvstaging/nim/ardennes-jailbreak-arctic-nim:v0.1'  
export MODEL_NAME='ardennes-jailbreak-arctic'  
docker pull $NIM_IMAGE  

And go!

docker run -it --name=$MODEL_NAME   
    --gpus=all --runtime=nvidia   
    -e NGC_API_KEY="$NGC_API_KEY"   
    --expose 8000   
    $NIM_IMAGE  

Running inference with the NIM

The running NIM container exposes a standard REST API and you can send POST requests to the v1/classify endpoint as JSON to get model responses.

$ curl --data '{"input": "hello this is a test"}' --header "Content-Type: application/json" --header "Accept: application/json" http://0.0.0.0:8000/v1/classify  

This will return a JSON dictionary with the model’s prediction of whether or not the provided input is a jailbreaking attempt.

{"jailbreak": false, "score": -0.9921652427737031}  

Reference(s):

Additional details about the model, including comparisons to other public models are available in the accompanying paper, accepted to the 2025 AAAI workshop on AI for Cyber Security (AICS).

Container Version(s):

NemoGuard-JailbreakDetect-v1.10.1: Jailbreak detection model using Snowflake-arctic-embed-m embeddings

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal developer team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.

You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.