Holoscan Sample App Data for AI-based Endoscopy Tool Tracking

This resource contains the convolutional LSTM model for tool tracking in laparoscopic videos by Nwoye et. al [1], and a sample surgical video.

[1] Nwoye, C.I., Mutter, D., Marescaux, J. and Padoy, N., 2019. Weakly supervised convolutional LSTM approach for tool tracking in laparoscopic videos. International journal of computer assisted radiology and surgery, 14(6), pp.1059-1067

Model

The AI model for instrument tracking in endoscopy is an LSTM model which given an RGB image of 854 x 480 provides

per-instrument detection probability,
per-instrument detected tooltip location,
semantic segmentation of the instruments. Each pixel stores 7 labels.

Note: The provided model is in ONNX format. It will automatically be converted into a TensorRT model (.engine) the first time it is processed by a Holoscan application.

Inputs

data_ph:0 - Input RGB image ( batchsize, height, width, channels)
- shape=[1, 480, 854, 3]
- dtype=float32
- range=[0, 255]
cellstate_ph:0 - LSTM hidden state tensor 0
- shape=[1, 60, 107, 7]
- dtype=float32
hiddenstate_ph:0 - LSTM hidden state tensor 1
- shape=[1, 60, 107, 7]
- dtype=float32

Outputs

Model/net_states:0 - LSTM output state tensor. Output of frame N is input for frame N + 1.
- shape=[1, 60, 107, 7]
- dtype=float32
Model/net_states:0 - LSTM hidden state tensor. Output of frame N is input for frame N + 1.
- shape=[1, 60, 107, 7]
- dtype=float32
probs:0 - Per instrument detection probability
- shape=[1, 7]
- dtype=float32
- range=[0,1]
Localize/scaled_coords:0 - Per-instrument (x,y) detected tooltip location
- shape=[1, 2, 7]
- dtype=float32
Localize_1/binary_masks:0 - image with per instrument segmentation mask. Each pixel stores 7 labels
- shape=[1, 60, 107, 7]
- dtype=float32

Video Data

The sample data, kindly provided by Research Group Camma, IHU Strasbourg and University of Strasbourg, is a surgical video that includes all 7 instruments classes supported by the tool tracking model. It's in a raw H264 format.

Note: the .h264 file must be converted into a GXF tensor file using the convert_video_to_gxf_entities.py script on GitHub to be used with the VideoStreamReplayer Holoscan operator.

License