NVIDIA
NVIDIA
Endoscopy Sample App Data
Resource
NVIDIA
NVIDIA
Endoscopy Sample App Data

Holoscan Sample App Data for AI-based Endoscopy Tool Tracking

Holoscan Sample App Data for AI-based Endoscopy Tool Tracking

This resource contains the convolutional LSTM model for tool tracking in laparoscopic videos by Nwoye et. al [1], and a sample surgical video.

[1] Nwoye, C.I., Mutter, D., Marescaux, J. and Padoy, N., 2019. Weakly supervised convolutional LSTM approach for tool tracking in laparoscopic videos. International journal of computer assisted radiology and surgery, 14(6), pp.1059-1067

Model

The AI model for instrument tracking in endoscopy is an LSTM model which given an RGB image of 854 x 480 provides

  • per-instrument detection probability,
  • per-instrument detected tooltip location,
  • semantic segmentation of the instruments.
    Each pixel stores 7 labels.

Note: The provided model is in ONNX format. It will automatically be converted into a TensorRT model (.engine) the first time it is processed by a Holoscan application.

Inputs

  • data_ph:0 - Input RGB image ( batchsize, height, width, channels)
    • shape=[1, 480, 854, 3]
    • dtype=float32
    • range=[0, 255]
  • cellstate_ph:0 - LSTM hidden state tensor 0
    • shape=[1, 60, 107, 7]
    • dtype=float32
  • hiddenstate_ph:0 - LSTM hidden state tensor 1
    • shape=[1, 60, 107, 7]
    • dtype=float32

Outputs

  • Model/net_states:0 - LSTM output state tensor. Output of frame N is input for frame N + 1.
    • shape=[1, 60, 107, 7]
    • dtype=float32
  • Model/net_states:0 - LSTM hidden state tensor. Output of frame N is input for frame N + 1.
    • shape=[1, 60, 107, 7]
    • dtype=float32
  • probs:0 - Per instrument detection probability
    • shape=[1, 7]
    • dtype=float32
    • range=[0,1]
  • Localize/scaled_coords:0 - Per-instrument (x,y) detected tooltip location
    • shape=[1, 2, 7]
    • dtype=float32
  • Localize_1/binary_masks:0 - image with per instrument segmentation mask. Each pixel stores 7 labels
    • shape=[1, 60, 107, 7]
    • dtype=float32

Video Data

The sample data, kindly provided by Research Group Camma, IHU Strasbourg and University of Strasbourg, is a surgical video that includes all 7 instruments classes supported by the tool tracking model. It's in a raw H264 format.

Note: the .h264 file must be converted into a GXF tensor file using the convert_video_to_gxf_entities.py script on GitHub to be used with the VideoStreamReplayer Holoscan operator.

License

Refer to the license agreement for use of the sample data.

Publisher
NVIDIA
NVIDIA
Latest Version20230222
UpdatedApril 19, 2023 UTC
Compressed Size47.27 MB