This resource contains the convolutional LSTM model for tool tracking in laparoscopic videos by Nwoye et. al [1], and a sample surgical video.
The AI model for instrument tracking in endoscopy is an LSTM model which given an RGB image of 854 x 480
provides
Note: The provided model is in ONNX format. It will automatically be converted into a TensorRT model (.engine) the first time it is processed by a Holoscan application.
data_ph:0
- Input RGB image ( batchsize, height, width, channels)shape=[1, 480, 854, 3]
dtype=float32
range=[0, 255]
cellstate_ph:0
- LSTM hidden state tensor 0shape=[1, 60, 107, 7]
dtype=float32
hiddenstate_ph:0
- LSTM hidden state tensor 1shape=[1, 60, 107, 7]
dtype=float32
Model/net_states:0
- LSTM output state tensor. Output of frame N
is input for frame N + 1
.shape=[1, 60, 107, 7]
dtype=float32
Model/net_states:0
- LSTM hidden state tensor. Output of frame N
is input for frame N + 1
.shape=[1, 60, 107, 7]
dtype=float32
probs:0
- Per instrument detection probabilityshape=[1, 7]
dtype=float32
range=[0,1]
Localize/scaled_coords:0
- Per-instrument (x,y) detected tooltip locationshape=[1, 2, 7]
dtype=float32
Localize_1/binary_masks:0
- image with per instrument segmentation mask. Each pixel stores 7 labelsshape=[1, 60, 107, 7]
dtype=float32
The sample data, kindly provided by Research Group Camma, IHU Strasbourg and University of Strasbourg, is a surgical video that includes all 7 instruments classes supported by the tool tracking model. It's in a raw H264 format.
Note: the .h264 file must be converted into a GXF tensor file using the convert_video_to_gxf_entities.py
script on GitHub to be used with the VideoStreamReplayer
Holoscan operator.
Refer to the license agreement for use of the sample data.