This sample code is a ready-to-use Jupyter notebook to fine-tune the Clara Train COVID-19 CT Scan Classifier with a reference dataset. A CT scan, or computed tomography scan, is a 3D medical imaging procedure that uses computer-processed combinations of many X-ray measurements taken from different angles to produce cross-sectional (tomographic) images. This example uses the Clara Train SDK, NVIDIA’s domain optimized application framework that accelerates deep learning training and inference for medical imaging use cases.
This example is designed to work seamlessly with the NGC-AzureML Quick Launch CLI Toolkit (azureml-ngc-tools), which provides you direct access to a fully-setup and optimally configured Azure Machine Learning environment with this Jupyter notebook, and the additional resources required for this example, pre-installed for you.
Clara Train is a framework that includes two main libraries - Training Framework, a TensorFlow based framework to kick start AI development with techniques like transfer learning, federated learning and AutoML; and an AI-Assisted Annotation (AIAA), which enables medical viewers to rapidly create annotated datasets suitable for training. Clara Train utilizes a concept called MMAR (Medical Model ARchive) that describes a model, configuration, transforms, and data associated with the model.
The Clara CT-Scan Covid Classifier is used as a base model. This model is developed by NVIDIA researchers in conjunction with the National Institute of Health (NIH). It was trained and evaluated on a global dataset with thousands of experimental cohorts collected from across the globe.
The model achieved an accuracy of greater than 90% on a test set consisting of more than one thousand CT images collected across the globe. The model requires two inputs, a CT scan image and a lung segmentation image, to guide the model to focus on the lung area. Before training, the data is pre-processed to be in Hounsfield units and to be in certain orientation.
In this Jupyter Notebook, we will perform several steps:
First, we will use this pre-trained classification model to classify data that has not been preprocessed. Labelled data from two sources are used:
CT-scans from COVID-19 patients on the COVID-19 CT scans Kaggle Database
CT-scans from non-COVID-19 CT-scans from The Cancer Imaging Archive
Given that some of the data is not pre-processed, the model will not perform to its best on classifying some of the examples.
Next, we will demonstrate the fine-tuning capabilities of the Clara Train SDK to fine tune the model with the above mentioned datasets, after which the model can correctly classify some of the examples it misclassified at first.
The NGC-AzureML Quick Launch CLI Toolkit (azureml-ngc-tools) is the quickest way to get started with deploying this example on a Microsoft Azure. The steps involved are:
To take a look at the notebook before you deploy, follow these steps:
This Jupyter Notebook example code was developed by NVIDIA. It is based on a segmentation and classification model developed by NVIDIA researchers in conjunction with the NIH.
The Software is for Research Use Only. Software’s recommendation should not be solely or primarily relied upon to diagnose or treat COVID-19 by a Healthcare Professional. This research use only software has not been cleared or approved by FDA or any regulatory agency.
This Jupyter Notebook example uses the Clara Train SDK which is to be used in accordance with the End User License Agreement included with it. Licenses are also available along with the model application zip file. By pulling and using the Clara Train SDK container and downloading models, you accept the terms and conditions of these licenses.