Linux / amd64
Linux / arm64
Morpheus allows teams to build their own optimized pipelines that address cybersecurity and information security use cases. Morpheus provides development capabilities around dynamic protection, real-time telemetry, adaptive policies, and cyber defenses for detecting and remediating cybersecurity threats.
You will need to ensure that Docker has been properly configured for including the NVIDIA Container Runtime. This can be specified as either the default runtime or explicitly like this:
docker run --rm -ti --runtime=nvidia --gpus=all $ANY_OTHER_DOCKER_ARGS nvcr.io/nvstaging/nvaie/morpheus-pb24h1:24.02.01-runtime bash
More detailed instructions for this mode can be found in the Getting Started Guide on GitHub.
The Morpheus pipeline can be configured in two ways:
morpheus
)See the examples
directory in the Github repo for examples on how to configure a pipeline via Python.
The provided CLI (morpheus
) is capable of running the included tools as well as any linear pipeline.
Usage: morpheus [OPTIONS] COMMAND [ARGS]...
Options:
--debug / --no-debug [default: no-debug]
--log_level [CRITICAL|FATAL|ERROR|WARN|WARNING|INFO|DEBUG]
Specify the logging level to use. [default:
WARNING]
--log_config_file FILE Config file to use to configure logging. Use
only for advanced situations. Can accept
both JSON and ini style configurations
--plugin TEXT Adds a Morpheus CLI plugin. Can either be a
module name or path to a python module
--version Show the version and exit.
--help Show this message and exit.
Commands:
run Run one of the available pipelines
tools Run a utility tool
Usage: morpheus run [OPTIONS] COMMAND [ARGS]...
Options:
--num_threads INTEGER RANGE Number of internal pipeline threads to use [default: 80; x>=1]
--pipeline_batch_size INTEGER RANGE
Internal batch size for the pipeline. Can be much larger than the model batch size. Also used for Kafka consumers [default: 256; x>=1]
--model_max_batch_size INTEGER RANGE
Max batch size to use for the model [default: 8; x>=1]
--edge_buffer_size INTEGER RANGE
The size of buffered channels to use between nodes in a pipeline. Larger values reduce backpressure at the cost of memory. Smaller values
will push messages through the pipeline quicker. Must be greater than 1 and a power of 2 (i.e. 2, 4, 8, 16, etc.) [default: 128; x>=2]
--use_cpp BOOLEAN Whether or not to use C++ node and message types or to prefer python. Only use as a last resort if bugs are encountered [default: True]
--help Show this message and exit.
Commands:
pipeline-ae Run the inference pipeline with an AutoEncoder model
pipeline-fil Run the inference pipeline with a FIL model
pipeline-nlp Run the inference pipeline with a NLP model
pipeline-other Run a custom inference pipeline without a specific model type
Usage: morpheus run pipeline-ae [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...
Configure and run the pipeline. To configure the pipeline, list the stages in the order that data should flow. The output of each stage will become the input for the
next stage. For example, to read, classify and write to a file, the following stages could be used
pipeline from-file --filename=my_dataset.json deserialize preprocess inf-triton --model_name=my_model
--server_url=localhost:8001 filter --threshold=0.5 to-file --filename=classifications.json
Pipelines must follow a few rules:
1. Data must originate in a source stage. Current options are `from-file` or `from-kafka`
2. A `deserialize` stage must be placed between the source stages and the rest of the pipeline
3. Only one inference stage can be used. Zero is also fine
4. The following stages must come after an inference stage: `add-class`, `filter`, `gen-viz`
Options:
--columns_file DATA FILE [required]
--labels_file DATA FILE Specifies a file to read labels from in order to convert class IDs into labels. A label file is a simple text file where each line
corresponds to a label. If unspecified, only a single output label is created for FIL
--userid_column_name TEXT Which column to use as the User ID. [default: userIdentityaccountId; required]
--userid_filter TEXT Specifying this value will filter all incoming data to only use rows with matching User IDs. Which column is used for the User ID is
specified by `userid_column_name`
--feature_scaler [none|standard|gauss_rank]
Autoencoder feature scaler [default: standard]
--use_generic_model Whether to use a generic model when user does not have minimum number of training rows
--viz_file FILE Save a visualization of the pipeline at the specified location
--help Show this message and exit.
Commands:
add-class Add detected classifications to each message.
add-scores Add probability scores to each message.
buffer (Deprecated) Buffer results.
delay (Deprecated) Delay results for a certain duration.
filter Filter message by a classification threshold.
from-azure Source stage is used to load Azure Active Directory messages.
from-cloudtrail Load messages from a Cloudtrail directory.
from-duo Source stage is used to load Duo Authentication messages.
inf-pytorch Perform inference with PyTorch.
inf-triton Perform inference with Triton Inference Server.
monitor Display throughput numbers at a specific point in the pipeline.
preprocess Prepare Autoencoder input DataFrames for inference.
serialize Include & exclude columns from messages.
timeseries Perform time series anomaly detection and add prediction.
to-file Write all messages to a file.
to-kafka Write all messages to a Kafka cluster.
train-ae Train an Autoencoder model on incoming data.
trigger Buffer data until previous stage has completed.
validate Validate pipeline output for testing.
Usage: morpheus run pipeline-fil [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...
Configure and run the pipeline. To configure the pipeline, list the stages in the order that data should flow. The output of each stage will become the input for the
next stage. For example, to read, classify and write to a file, the following stages could be used
pipeline from-file --filename=my_dataset.json deserialize preprocess inf-triton --model_name=my_model
--server_url=localhost:8001 filter --threshold=0.5 to-file --filename=classifications.json
Pipelines must follow a few rules:
1. Data must originate in a source stage. Current options are `from-file` or `from-kafka`
2. A `deserialize` stage must be placed between the source stages and the rest of the pipeline
3. Only one inference stage can be used. Zero is also fine
4. The following stages must come after an inference stage: `add-class`, `filter`, `gen-viz`
Options:
--model_fea_length INTEGER RANGE
Number of features trained in the model [default: 29; x>=1]
--label TEXT Specify output labels. Ignored when --labels_file is specified [default: mining]
--labels_file DATA FILE Specifies a file to read labels from in order to convert class IDs into labels. A label file is a simple text file where each line
corresponds to a label. If unspecified the value specified by the --label flag will be used.
--columns_file DATA FILE Specifies a file to read column features. [default: data/columns_fil.txt]
--viz_file FILE Save a visualization of the pipeline at the specified location
--help Show this message and exit.
Commands:
add-class Add detected classifications to each message.
add-scores Add probability scores to each message.
buffer (Deprecated) Buffer results.
delay (Deprecated) Delay results for a certain duration.
deserialize Deserialize source data into Dataframes.
dropna Drop null data entries from a DataFrame.
filter Filter message by a classification threshold.
from-appshield Source stage is used to load Appshield messages from one or more plugins into a dataframe. It normalizes nested json messages and arranges them into a
dataframe by snapshot and source(Determine which source generated the plugin messages).
from-file Load messages from a file.
from-kafka Load messages from a Kafka cluster.
inf-identity Perform inference for testing that performs a no-op.
inf-pytorch Perform inference with PyTorch.
inf-triton Perform inference with Triton Inference Server.
mlflow-drift Report model drift statistics to ML Flow.
monitor Display throughput numbers at a specific point in the pipeline.
preprocess Prepare FIL input DataFrames for inference.
serialize Include & exclude columns from messages.
to-file Write all messages to a file.
to-kafka Write all messages to a Kafka cluster.
trigger Buffer data until previous stage has completed.
validate Validate pipeline output for testing.
Usage: morpheus run pipeline-nlp [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...
Configure and run the pipeline. To configure the pipeline, list the stages in the order that data should flow. The output of each stage will become the input for the
next stage. For example, to read, classify and write to a file, the following stages could be used
pipeline from-file --filename=my_dataset.json deserialize preprocess inf-triton --model_name=my_model
--server_url=localhost:8001 filter --threshold=0.5 to-file --filename=classifications.json
Pipelines must follow a few rules:
1. Data must originate in a source stage. Current options are `from-file` or `from-kafka`
2. A `deserialize` stage must be placed between the source stages and the rest of the pipeline
3. Only one inference stage can be used. Zero is also fine
4. The following stages must come after an inference stage: `add-class`, `filter`, `gen-viz`
Options:
--model_seq_length INTEGER RANGE
Limits the length of the sequence returned. If tokenized string is shorter than max_length, output will be padded with 0s. If the
tokenized string is longer than max_length and do_truncate == False, there will be multiple returned sequences containing the overflowing
token-ids. Default value is 256 [default: 256; x>=1]
--label TEXT Specify output labels.
--labels_file DATA FILE Specifies a file to read labels from in order to convert class IDs into labels. A label file is a simple text file where each line
corresponds to a label.Ignored when --label is specified [default: data/labels_nlp.txt]
--viz_file FILE Save a visualization of the pipeline at the specified location
--help Show this message and exit.
Commands:
add-class Add detected classifications to each message.
add-scores Add probability scores to each message.
buffer (Deprecated) Buffer results.
delay (Deprecated) Delay results for a certain duration.
deserialize Deserialize source data into Dataframes.
dropna Drop null data entries from a DataFrame.
filter Filter message by a classification threshold.
from-file Load messages from a file.
from-kafka Load messages from a Kafka cluster.
gen-viz (Deprecated) Write out vizualization DataFrames.
inf-identity Perform inference for testing that performs a no-op.
inf-pytorch Perform inference with PyTorch.
inf-triton Perform inference with Triton Inference Server.
mlflow-drift Report model drift statistics to ML Flow.
monitor Display throughput numbers at a specific point in the pipeline.
preprocess Prepare NLP input DataFrames for inference.
serialize Include & exclude columns from messages.
to-file Write all messages to a file.
to-kafka Write all messages to a Kafka cluster.
trigger Buffer data until previous stage has completed.
validate Validate pipeline output for testing.
Usage: morpheus run pipeline-other [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...
Configure and run the pipeline. To configure the pipeline, list the stages in the order that data should flow. The output of each stage will become the input for the
next stage. For example, to read, classify and write to a file, the following stages could be used
pipeline from-file --filename=my_dataset.json deserialize preprocess inf-triton --model_name=my_model
--server_url=localhost:8001 filter --threshold=0.5 to-file --filename=classifications.json
Pipelines must follow a few rules:
1. Data must originate in a source stage. Current options are `from-file` or `from-kafka`
2. A `deserialize` stage must be placed between the source stages and the rest of the pipeline
3. Only one inference stage can be used. Zero is also fine
4. The following stages must come after an inference stage: `add-class`, `filter`, `gen-viz`
Options:
--model_fea_length INTEGER RANGE
Number of features trained in the model [default: 1; x>=1]
--label TEXT Specify output labels. Ignored when --labels_file is specified
--labels_file DATA FILE Specifies a file to read labels from in order to convert class IDs into labels. A label file is a simple text file where each line
corresponds to a label.
--viz_file FILE Save a visualization of the pipeline at the specified location
--help Show this message and exit.
Commands:
add-class Add detected classifications to each message.
add-scores Add probability scores to each message.
buffer (Deprecated) Buffer results.
delay (Deprecated) Delay results for a certain duration.
deserialize Deserialize source data into Dataframes.
dropna Drop null data entries from a DataFrame.
filter Filter message by a classification threshold.
from-file Load messages from a file.
from-kafka Load messages from a Kafka cluster.
inf-identity Perform inference for testing that performs a no-op.
inf-pytorch Perform inference with PyTorch.
inf-triton Perform inference with Triton Inference Server.
mlflow-drift Report model drift statistics to ML Flow.
monitor Display throughput numbers at a specific point in the pipeline.
serialize Include & exclude columns from messages.
to-file Write all messages to a file.
to-kafka Write all messages to a Kafka cluster.
trigger Buffer data until previous stage has completed.
validate Validate pipeline output for testing.
Usage: morpheus tools [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
autocomplete Utility for installing/updating/removing shell completion for Morpheus
onnx-to-trt Converts an ONNX model to a TRT engine
Usage: morpheus tools autocomplete [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
install Install the Morpheus shell command completion
show Show the Morpheus shell command completion code
Usage: morpheus tools onnx-to-trt [OPTIONS]
Options:
--input_model PATH [required]
--output_model PATH [required]
--batches <INTEGER INTEGER>... [required]
--seq_length INTEGER [required]
--max_workspace_size INTEGER [default: 16000]
--help Show this message and exit.
NOTE: The conversion tooling requires the separate installation of TensorRT 8.2.
NVIDIA has observed false positive identification, by automated vulnerability scanning tools, of packages against National Vulnerability Database (NVD) security bulletins and GitHub Security Advisories (GHSA). This can happen due to package name collisions (e.g., Mamba Boa with GPG Boa, python docker SDK with docker core). NVIDIA is committed to providing the highest quality software distribution to our customers.
There is a bug in RAPIDS whereby attempting to serialize any cudf
dataframe whose column names are numpy
integers will result in a TypeError
similar to TypeError: can not serialize 'numpy.int64' object
. A fix will be provided in the next Production Branch October 2024 (PB24h2) release. As a workaround, users should rewrite the dataframe column names by getting the underlying int/float value from the numpy
type and reassigning that value as the column name.
Morpheus is distributed as open source software under the Apache Software License 2.0.
NVIDIA AI Enterprise provides global support for NVIDIA AI software, including Morpheus. For more information on NVIDIA AI Enterprise please consult this overview and the NVIDIA AI Enterprise End User License Agreement.