NGC | Catalog
CatalogResourcesEndoscopy Sample App Data

Endoscopy Sample App Data

Logo for Endoscopy Sample App Data
Description
Holoscan Sample App Data for AI-based Endoscopy Tool Tracking
Publisher
NVIDIA
Latest Version
20230222
Modified
April 19, 2023
Compressed Size
47.27 MB

Holoscan Sample App Data for AI-based Endoscopy Tool Tracking

This resource contains the convolutional LSTM model for tool tracking in laparoscopic videos by Nwoye et. al [1], and a sample surgical video.

[1] Nwoye, C.I., Mutter, D., Marescaux, J. and Padoy, N., 2019. Weakly supervised convolutional LSTM approach for tool tracking in laparoscopic videos. International journal of computer assisted radiology and surgery, 14(6), pp.1059-1067

Model

The AI model for instrument tracking in endoscopy is an LSTM model which given an RGB image of 854 x 480 provides

  • per-instrument detection probability,
  • per-instrument detected tooltip location,
  • semantic segmentation of the instruments. Each pixel stores 7 labels.

Note: The provided model is in ONNX format. It will automatically be converted into a TensorRT model (.engine) the first time it is processed by a Holoscan application.

Inputs

  • data_ph:0 - Input RGB image ( batchsize, height, width, channels)
    • shape=[1, 480, 854, 3]
    • dtype=float32
    • range=[0, 255]
  • cellstate_ph:0 - LSTM hidden state tensor 0
    • shape=[1, 60, 107, 7]
    • dtype=float32
  • hiddenstate_ph:0 - LSTM hidden state tensor 1
    • shape=[1, 60, 107, 7]
    • dtype=float32

Outputs

  • Model/net_states:0 - LSTM output state tensor. Output of frame N is input for frame N + 1.
    • shape=[1, 60, 107, 7]
    • dtype=float32
  • Model/net_states:0 - LSTM hidden state tensor. Output of frame N is input for frame N + 1.
    • shape=[1, 60, 107, 7]
    • dtype=float32
  • probs:0 - Per instrument detection probability
    • shape=[1, 7]
    • dtype=float32
    • range=[0,1]
  • Localize/scaled_coords:0 - Per-instrument (x,y) detected tooltip location
    • shape=[1, 2, 7]
    • dtype=float32
  • Localize_1/binary_masks:0 - image with per instrument segmentation mask. Each pixel stores 7 labels
    • shape=[1, 60, 107, 7]
    • dtype=float32

Video Data

The sample data, kindly provided by Research Group Camma, IHU Strasbourg and University of Strasbourg, is a surgical video that includes all 7 instruments classes supported by the tool tracking model. It's in a raw H264 format.

Note: the .h264 file must be converted into a GXF tensor file using the convert_video_to_gxf_entities.py script on GitHub to be used with the VideoStreamReplayer Holoscan operator.

License

Refer to the license agreement for use of the sample data.