The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement.
Nemotron-3-8B-SteerLM is an 8 billion parameter generative language model instruct-tuned on an 8B base model. It takes input with context length up to 4,096 tokens. The model has been customized using the SteerLM method developed by NVIDIA to allow for user control of model outputs during inference.
Key capabilities enabled by SteerLM:
Nemotron-3-8B-SteerLM is part of Nemotron-3, which is a family of enterprise ready GPT-based decoder-only generative text models compatible with NVIDIA NeMo Framework. For other models in this collection, see the models page.
NVIDIA NeMo is an end-to-end, cloud-native framework to build, customize, and deploy generative AI models anywhere. It includes training and inferencing frameworks, guardrailing toolkits, data curation tools, and pretrained models, offering enterprises an easy, cost-effective, and fast way to adopt generative AI.
Architecture Type: Transformer
Network Architecture: Generative Pre-Trained Transformer (GPT-3)
The SteerLM method involves the following key steps:
SteerLM-8B applies this technique on top of the open-source NVIDIA GPT model architecture. It was pretrained on internet-scale data and then customized using OASST, HH-RLHF, Light, a subset of permissive licensed OpenPlatypus, and some internally collected SFT data.
<extra_id_0>System
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
<extra_id_1>User
{prompt 1}
<extra_id_1>Assistant
<extra_id_2>quality:4,understanding:4,correctness:4,coherence:4,complexity:4,verbosity:4,toxicity:0,humor:0,creativity:0,violence:0,helpfulness:4,not_appropriate:0,hate_speech:0,sexual_content:0,fails_task:0,political_content:0,moral_judgement:0,lang:en
<extra_id_0>System
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
<extra_id_1>User
{prompt 1}
<extra_id_1>Assistant
<extra_id_2>quality:4,understanding:4,correctness:4,coherence:4,complexity:4,verbosity:4,toxicity:0,humor:0,creativity:0,violence:0,helpfulness:4,not_appropriate:0,hate_speech:0,sexual_content:0,fails_task:0,political_content:0,moral_judgement:0,lang:en
{response 1}
<extra_id_1>User
{prompt 2}
<extra_id_1>Assistant
<extra_id_2>quality:4,understanding:4,correctness:4,coherence:4,complexity:4,verbosity:4,toxicity:0,humor:0,creativity:0,violence:0,helpfulness:4,not_appropriate:0,hate_speech:0,sexual_content:0,fails_task:0,political_content:0,moral_judgement:0,lang:en
PROMPT_TEMPLATE = """<extra_id_0>System
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
<extra_id_1>User
{prompt}
<extra_id_1>Assistant
<extra_id_2>quality:4,understanding:4,correctness:4,coherence:4,complexity:4,verbosity:4,toxicity:0,humor:0,creativity:0,violence:0,helpfulness:4,not_appropriate:0,hate_speech:0,sexual_content:0,fails_task:0,political_content:0,moral_judgement:0,lang:en"""
question = "Write a poem on NVIDIA in the style of Shakespeare"
prompt = PROMPT_TEMPLATE.format(prompt=question)
print(prompt)
Each of the properties (e.g. humor, toxicity…) can receive integer values in the range [0,4]
.
Runtime Engine(s): NVIDIA AI Enterprise
Toolkit: NeMo Framework
See NeMo inference container documentation for details on how to setup and deploy an inference server.
Sample Inference Code:
from nemo.deploy import NemoQuery
# In this case, we run inference on the same machine
nq = NemoQuery(url="localhost:8000", model_name="Nemotron-3-8B-Chat-4K-RLHF")
# See above for prompt format
output = nq.query_llm(prompts=[prompt], max_output_token=200, top_k=1, top_p=0.0, temperature=0.1)
# NOTE: Chat models require post-processing the output since the `NemoQuery` API
# does not support stopping generation on the special <extra_id_1> token.
output = [[s.split("<extra_id_1>", 1)[0].strip() for s in out] for out in output]
print(output)
Supported Hardware:
Nemotron-3-8B-chat-4k-steerlm-BF16-1
NVIDIA models are trained on a diverse set of public and proprietary datasets. NVIDIA is committed to the responsible development of large language models and conducts reviews of all datasets included in training.
MT Bench Score
Category | Score |
---|---|
Total | 5.6 |
Writing | 6.35 |
Roleplay | 6.9 |
Extraction | 5.25 |
Stem | 7.5 |
Humanities | 9.02 |
Reasoning | 4.9 |
Math | 2.0 |
Coding | 2.9 |
|
The 8B-Chat-SteerLM model is for users who want to customize a model’s response during inference.
Technology can have a profound impact on people and the world, and NVIDIA is committed to enabling trust and transparency in AI development. NVIDIA encourages users to adopt principles of AI ethics and trustworthiness to guide your business decisions by following the guidelines in the NVIDIA AI Foundation Models Community License Agreement.