Holoscan Sample App Data for AI-based Endoscopy Tool Tracking
Holoscan Sample App Data for AI-based Endoscopy Tool Tracking
This resource contains the convolutional LSTM model for tool tracking in laparoscopic videos by Nwoye et. al [1], and a sample surgical video.
Model
The AI model for instrument tracking in endoscopy is an LSTM model which given an RGB image of 854 x 480 provides
- per-instrument detection probability,
- per-instrument detected tooltip location,
- semantic segmentation of the instruments.
Each pixel stores 7 labels.
Note: The provided model is in ONNX format. It will automatically be converted into a TensorRT model (.engine) the first time it is processed by a Holoscan application.
Inputs
data_ph:0- Input RGB image ( batchsize, height, width, channels)shape=[1, 480, 854, 3]dtype=float32range=[0, 255]
cellstate_ph:0- LSTM hidden state tensor 0shape=[1, 60, 107, 7]dtype=float32
hiddenstate_ph:0- LSTM hidden state tensor 1shape=[1, 60, 107, 7]dtype=float32
Outputs
Model/net_states:0- LSTM output state tensor. Output of frameNis input for frameN + 1.shape=[1, 60, 107, 7]dtype=float32
Model/net_states:0- LSTM hidden state tensor. Output of frameNis input for frameN + 1.shape=[1, 60, 107, 7]dtype=float32
probs:0- Per instrument detection probabilityshape=[1, 7]dtype=float32range=[0,1]
Localize/scaled_coords:0- Per-instrument (x,y) detected tooltip locationshape=[1, 2, 7]dtype=float32
Localize_1/binary_masks:0- image with per instrument segmentation mask. Each pixel stores 7 labelsshape=[1, 60, 107, 7]dtype=float32
Video Data
The sample data, kindly provided by Research Group Camma, IHU Strasbourg and University of Strasbourg, is a surgical video that includes all 7 instruments classes supported by the tool tracking model. It's in a raw H264 format.
Note: the .h264 file must be converted into a GXF tensor file using the convert_video_to_gxf_entities.py script on GitHub to be used with the VideoStreamReplayer Holoscan operator.
License
Refer to the license agreement for use of the sample data.