Fine-Tuning COVID-19 CT Scan Classifier

Fine-Tuning COVID-19 CT Scan Classifier

Logo for Fine-Tuning COVID-19 CT Scan Classifier
Example Jupyter Notebook Sample Code to Fine-Tune Clara Train COVID-19 CT Scan Classification Pretrained Model with Custom Dataset
Latest Version
April 4, 2023
Compressed Size
985.5 MB


This sample code is a ready-to-use Jupyter notebook to fine-tune the Clara Train COVID-19 CT Scan Classifier with a reference dataset. A CT scan, or computed tomography scan, is a 3D medical imaging procedure that uses computer-processed combinations of many X-ray measurements taken from different angles to produce cross-sectional (tomographic) images. This example uses the Clara Train SDK, NVIDIA’s domain optimized application framework that accelerates deep learning training and inference for medical imaging use cases.

This example is designed to work seamlessly with the NGC-AzureML Quick Launch CLI Toolkit (azureml-ngc-tools), which provides you direct access to a fully-setup and optimally configured Azure Machine Learning environment with this Jupyter notebook, and the additional resources required for this example, pre-installed for you.

Clara Train is a framework that includes two main libraries - Training Framework, a TensorFlow based framework to kick start AI development with techniques like transfer learning, federated learning and AutoML; and an AI-Assisted Annotation (AIAA), which enables medical viewers to rapidly create annotated datasets suitable for training. Clara Train utilizes a concept called MMAR (Medical Model ARchive) that describes a model, configuration, transforms, and data associated with the model.

Steps Demonstrated In The Example

The Clara CT-Scan Covid Classifier is used as a base model. This model is developed by NVIDIA researchers in conjunction with the National Institute of Health (NIH). It was trained and evaluated on a global dataset with thousands of experimental cohorts collected from across the globe.

The model achieved an accuracy of greater than 90% on a test set consisting of more than one thousand CT images collected across the globe. The model requires two inputs, a CT scan image and a lung segmentation image, to guide the model to focus on the lung area. Before training, the data is pre-processed to be in Hounsfield units and to be in certain orientation.

In this Jupyter Notebook, we will perform several steps:

First, we will use this pre-trained classification model to classify data that has not been preprocessed. Labelled data from two sources are used:

  1. CT-scans from COVID-19 patients on the COVID-19 CT scans Kaggle Database

  2. CT-scans from non-COVID-19 CT-scans from The Cancer Imaging Archive

Given that some of the data is not pre-processed, the model will not perform to its best on classifying some of the examples.

Next, we will demonstrate the fine-tuning capabilities of the Clara Train SDK to fine tune the model with the above mentioned datasets, after which the model can correctly classify some of the examples it misclassified at first.

Usage Instructions

The NGC-AzureML Quick Launch CLI Toolkit (azureml-ngc-tools) is the quickest way to get started with deploying this example on a Microsoft Azure. The steps involved are:

  1. Download the ready-to-use config files for this example from NGC here.
  2. Install azureml-ngc-tools following the instructions described.
  3. Modify the azure_config.json with your Azure billing credentials and run azureml-ngc-tools as described in Step 1.
  4. Copy the URL to your AzureML environment produced by the toolkit to your browser of choice.
  5. Run CovidCT-ScanClassifier.ipynb in your AzureML Jupyter Lab.

Viewing a Jupyter Notebook in NGC

To take a look at the notebook before you deploy, follow these steps:

  1. Navigate to the File Browser tab of this asset
  2. Under the actions menu (three dots) for the CovidCT-ScanClassifier.ipynb file (or any desired .ipynb file) select "View Jupyter"
  3. There you have it! You can read the sample code before you deploy

Suggested Reading

  1. Getting Started Guide at
  2. Use the NVIDIA Developer Forum for questions regarding this Software


This Jupyter Notebook example code was developed by NVIDIA. It is based on a segmentation and classification model developed by NVIDIA researchers in conjunction with the NIH.

The Software is for Research Use Only. Software’s recommendation should not be solely or primarily relied upon to diagnose or treat COVID-19 by a Healthcare Professional. This research use only software has not been cleared or approved by FDA or any regulatory agency.


This Jupyter Notebook example uses the Clara Train SDK which is to be used in accordance with the End User License Agreement included with it. Licenses are also available along with the model application zip file. By pulling and using the Clara Train SDK container and downloading models, you accept the terms and conditions of these licenses.