A pre-trained model for volumetric (3D) segmentation of abdominal organs from CT image. The model was first pre-trained using self-supervised learning technique, the resulting model was fine-tuned with a fully supervised task of abdominal multi-organ segmentation.
Note: The 4.1 version of this model is only compatible with the 4.1 version of the Clara Train SDK container
This model is trained using self-supervised learning first. The self-supervised learning utilizes augmentations to mutate the image thus creating a self-supervised image reconstruction task. Further, advanced techniques such as contrastive learning have been utilized to augment the learning process. Once the self-supervised learning is complete, the encoder model weights are transferred to the full model UNETR which utilizes the ViT encoder as the backbone. Fully supervised learning is then performed on the full model.
A 3D patch is selected from the CT volume. The 3D patch is then augmented using augmentations such as flips, outer cutout, inner cutout and local patch shuffling.
Two augmented patches are generated via random combinations of the afore-mentioned augmentations. Both patches are reconstructed with a forward pass through network which is based on the ViT backbone. L1 Loss and Contrastive loss are used to drive the learning process of the model.
Once this model is trained to convergence, the backbone of ViT is transffered to the backbone ViT of the full model UNETR [1], which is then trained for a 3D segmentation task.
The segmentation of abdominal region is formulated as the voxel-wise 14-class classification. Each voxel is predicted as one of the following:
The model is optimized with Adam optimizer method minimizing soft dice loss and cross-entropy loss between the predicted mask and ground truth segmentation.
The self-supervised learning was performed with the following:
The training was performed with the following:
If out-of-memory or program crash occurs while caching the data set, please change the cache_rate
in CacheDataset
to a lower value in the range (0, 1).
The training data is from The Cancer Imaging Archive (TCIA) [2]. It contains a total of 771 3D CT Volumes. 600 were used for training and 171 for validation. The dataset split is provided as a json file in the 'config' directory of the MMAR. The filename is 'tcia_dataset_split.json'.
The training data is from the MICCAI 2015 Beyond the Cranial Vault (BTCV) abdominal segmentation challenge BTCV.
It contains 30 3D CT volumes that contain annotations of 13 organs. The split is 24 training volumes and 6 validation volumes. The dataset split is provided as a json file in the 'config' directory of the MMAR. The filename is 'btcv_dataset_0.json'.
Dice score is used for evaluating the performance of the final downstream 3D segmentation model. The trained model achieved average validation Dice score 0.8100 averaged across 13 different organs of the abdomen.It should also be noted that when using the validate.sh from commands the Dice Score reported is 0.7887 (this is due to testing being done at original resolution of data).
For evaluating the self-supervised model, L1 error was used as the metric to select the best pre-trained model.
Training and validation curves over 400 epochs.
Validation mean dice score over 3000 epochs. The highest validation Dice score that was achieved is 0.8100. One can observe the decreasing training loss curve.
The model was validated with NVIDIA hardware and software. For hardware, the model can run on any NVIDIA GPU with memory greater than 16 GB. For software, this model is usable only as part of Transfer Learning & Annotation Tools in Clara Train SDK container. Find out more about Clara Train at the Clara Train Collections on NGC.
Full instructions for the training and validation workflow can be found in our documentation.
Inference is performed in a sliding window manner with a specified stride. Due to the large size of CT, it is best to use GPUs with 16GB or more of memory for inference/validation.
This training and inference pipeline was developed by NVIDIA. It is based on a segmentation model developed by NVIDIA researchers. This research use only software has not been cleared or approved by FDA or any regulatory agency. Clara pre-trained models are for developmental purposes only and cannot be used directly for clinical procedures.
[1] Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B., ... & Xu, D. (2022). Unetr: Transformers for 3d medical image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 574-584).
[2] Harmon, S. A., Sanford, T. H., Xu, S., Turkbey, E. B., Roth, H., Xu, Z., Yang, D., Myronenko, A., Anderson, V., Amalou, A., Blain, M., Kassin, M., Long, D., Varble, N., Walker, S. M., Bagci, U., Ierardi, A. M., Stellato, E., Plensich, G. G., … Turkbey, B. (2020). Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets. Nature Communications, 11(1).
End User License Agreement is included with the product. Licenses are also available along with the model application zip file. By pulling and using the Clara Train SDK container and downloading models, you accept the terms and conditions of these licenses.