NVIDIA
NVIDIA
NemoGuard JailbreakDetect
Container
NVIDIA
NVIDIA
NemoGuard JailbreakDetect

Container for classifying jailbreak attempts using NemoGuard JailbreakDetect

Join or Subscribe to get accessSubscribe to the product below to access this premium content:
NVIDIA Developer Program
NVIDIA Developer ProgramJoin the Developer Program for access to free tools, support, and tech resources.
Get Access
NVIDIA AI Enterprise
NVIDIA AI EnterpriseAccelerate your AI agent development
Subscribe Now
Note: You can gain access to hundreds more GPU-optimized artifacts by creating a free NGC account.
Already Subscribed?Log in

NemoGuard JailbreakDetect Overview

Description

NemoGuard JailbreakDetect is a random forest model trained by NVIDIA on snowflake-arctic-embed-m-long embeddings to detect attempts to jailbreak large language models. At the time of release, it is the best known publicly available model for detecting LLM jailbreak attempts.

This container includes two models: the NemoGuard JailbreakDetect random forest classifier, and the snowflake-arctic-embed-m-long embedding model.

The container components are ready for commercial/non-commercial use.

Third-Party Community Consideration

This container includes a model that is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see snowflake-arctic-embed-m-long model card.

License/Terms of Use:

GOVERNING TERMS: Use of the NIM container is governed by the NVIDIA Software License Agreement and the Product-Specific Terms for NVIDIA AI Products; use of this model is governed by the NVIDIA Community Model License.

You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.

Deployment Geography:

Global

Release Date:

Build.Nvidia.com [08/14/2025]

Hugging Face [01/15/2025]

NGC [08/14/2025]

Program Classes:

For example: The NemoGuard JailbreakDetect Container includes the following models:

Model Name & LinkUse CaseHow to Pull the Model
NemoGuard JailbreakDetect random forest classifierIntended to be deployed as a guardrail in an LLM system, to scan user-provided prompts for jailbreaking attempts prior to sending those prompts to an LLM.Automatic
snowflake-arctic-embed-m-long embedding modelProvides input embeddings of user prompts which are then used by the NemoGuard JailbreakDetect random forest classifier above.Automatic

Get Help

Enterprise Support

Get access to knowledge base articles and support cases or submit a ticket.

NVIDIA NIM Documentation

Visit the NIM Documentation for general information about using NIM, including an overview and deployment guides. Refer to the NemoGuard NIM Documentation for more information about the JailbreakDetect NIM, including a quickstart guide and API reference.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Deployment Details:

Setup

One-time access setup, as needed:

export NGC_API_KEY=<YOUR NGC API KEY>  
docker login nvcr.io  # ensure you login with the right key. Sometimes, if you've used another key in the past, this will just succeed without asking you for your new key. In this case, delete the config file it caches creds in, and try again.

# Username: $oauthtoken  
# Password: <NGC_API_KEY>  

Serving the model as a NIM

We provide the JailbreakDetect model as an NVIDIA NIM, so you can simply pull the image from docker and run it.

#!/bin/bash

export NGC_API_KEY=<your NGC personal key with access to the "nim/nvidia" org/team>  
export NIM_IMAGE='nvcr.io/nim/nvidia/nemoguard-jailbreak-detect:1.10.1'  
export MODEL_NAME='nemoguard-jailbreak-detect'  
docker pull $NIM_IMAGE  

And go!

docker run -it --name=$MODEL_NAME   
    --gpus=all --runtime=nvidia   
    -e NGC_API_KEY="$NGC_API_KEY"   
    --expose 8000   
    $NIM_IMAGE  

Running inference with the NIM

The running NIM container exposes a standard REST API and you can send POST requests to the v1/classify endpoint as JSON to get model responses.

$ curl --data '{"input": "hello this is a test"}' --header "Content-Type: application/json" --header "Accept: application/json" http://0.0.0.0:8000/v1/classify  

This will return a JSON dictionary with the model’s prediction of whether or not the provided input is a jailbreaking attempt.

{"jailbreak": false, "score": -0.9921652427737031}  

Reference(s):

Additional details about the model, including comparisons to other public models are available in the accompanying paper, accepted to the 2025 AAAI workshop on AI for Cyber Security (AICS).

Container Version(s):

NemoGuard-JailbreakDetect-v1.10.1: Jailbreak detection model using Snowflake-arctic-embed-m embeddings

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal developer team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.

You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.

Publisher
NVIDIA
NVIDIA
Latest Tag1.10.1
UpdatedAugust 28, 2025 UTC
Compressed Size9.88 GB
Multinode SupportNo
Multi-Arch SupportYes

NVIDIA uses cookies to improve your experience on our web site. We and our third-party partners also use cookies and other tools to collect and record information you provide as well as information about your interactions with our websites for performance improvement, analytics, and to assist in marketing efforts. By clicking "Accept All", you consent to our use of cookies and other tools as described in our Cookie Policy. You can manage your cookie settings by clicking on "Manage Settings." By continuing to use this site or by clicking one of the buttons below, you agree to our Terms of Service (which contains important waivers). Please see our Privacy Policy for more information on our privacy practices.