Mask Grounding DINO CC | NVIDIA NGC

NVIDIA

Mask Grounding DINO CC

Model

NVIDIA

Mask Grounding DINO CC

Open vocabulary multi-modal instance segmentation model trained on commercial data.

Field	Response
Intended Applications & Domains:	Open Set Segmenting (Text)
Model Type:	Instance Segmentation with arbitrary object categories or reference descriptions
Intended Users:	This model is intended for developers working in smart spaces, retail, and industrial applications.
Output:	Bounding Boxes, Confidence Scores, and Segmentation masks
Describe how the model works:	This model predicts bounding boxes and segmentation masks for each object in the image. It can detect objects specified via open-vocabulary text input or referring expressions, allowing flexible object recognition.
Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of:	Not Applicable
Technical Limitations:	The model may have difficulties in non-Flickr style data like medical, satellite, and industrial data.
Verified to have met prescribed NVIDIA standards:	Yes
Performance Metrics:	generalized Intersection over Union (gIoU), Target Accuracy (T_acc), No-Target Accuracy (N_acc), Mean Average Precision (mask_mAP)
Potential Known Risks:	Model may mis-localize or over-segment objects when expression/categories are ambiguous or outside the training distribution.
Licensing:	NVIDIA Open Model License