NGC Catalog
CLASSIC
Welcome Guest
Models
Climate in a Bottle

Climate in a Bottle

For downloads and more information, please view on a desktop device.
Logo for Climate in a Bottle
Description
cBottle is an diffusion model that generates atmospheric states at kilometer resolution using a cascaded diffusion architecture.
Publisher
NVIDIA
Latest Version
1.0
Modified
May 12, 2025
Size
7.27 GB

NVIDIA Model Card

EDM-Chaos Overview

cBottle

Description:

cBottle is an diffusion model that generates atmospheric states at kilometer resolution using a cascaded diffusion architecture.

This model is for research and development only.

License/Terms of Use:

Use of this model is governed by the NVIDIA Software and Model Evaluation License Agreement.

Deployment Geography:

Global

Use Case:

Researchers and developers in the field of climate modeling and Earth system science would use this model to generate random images of the Earth's atmosphere, leveraging the techniques described in the paper 'Climate in a Bottle: A generative foundation model for the kilometer-scale atmosphere'.

Release Date:

NGC: 05/12/2025

Model Architecture:

Architecture Type:

Convolutional Neural Network (CNN)

Network Architecture:

Song-UNet on HEALPix geometry

  • cBottle-3d has 150M parameters.
  • cBottle-video has 282M parameters.
  • cBottle-SR has 330M parameters.

Input:

Input Type(s):

  • Tensor (dataset label, one-hot encoded)
  • Tensor (day of year)
  • Tensor (second of day)
  • Tensor (monthly mean sea surface temperature, SST)

Input Format(s):

  • PyTorch Tensor
  • PyTorch Tensor
  • PyTorch Tensor
  • PyTorch Tensor

Input Parameters:

  • Tensor: 2D (batch, time_window)
  • Tensor: 2D (batch, time_window)
  • Tensor: 4D (batch, num_in_channels, time_window, cell)
  • Tensor: 4D (batch, 1024)

Other Properties Related to Input:

  • dataset label. ERA5 = 1, ICON =0.
  • day of year in days (0-365)
  • second of day in seconds (0-86399)
  • Monthly mean SST input is on the HEALPix 64 Grid

Output:

Output Type(s):

  • Tensor

Output Format:

  • PyTorch Tensor

Output Parameters:

  • Four dimensional (batch, channel, time window, cell)

Other Properties Related to Output:

  • Coarse model outputs to HPX 64 grid
  • SR model outputs to HPX 1024 grid.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration (Required For NVIDIA Models Only):

Runtime Engine(s):

PyTorch

Supported Hardware Microarchitecture Compatibility:

  • NVIDIA Ampere
  • NVIDIA Blackwell
  • NVIDIA Hopper
  • NVIDIA Turing

[Preferred/Supported] Operating System(s):

['Linux']

Model Version(s):

  • Model version: v1

Training, Testing, and Evaluation Datasets:

Total Number of Datasets:

3 Datasets:

  • ICON Cycle 3
  • inputs4MIPs
  • ERA5
Dataset Partition:
Training:
  • ICON: 2020-01-20 03:00:00 through 2024-03-06 12:00:00 (inclusive)
  • ERA5/inputs4MIPS: Years 1980--2017 (inclusive)
Validation:
  • ICON: 2020-01-20 03:00:00 through 2024-03-06 12:00:00 (inclusive)
  • ERA5/inputs4MIPs: Years 2018
Evaluation
  • ICON: no independent data withheld for evaluation
  • ERA5/inputs4MIPS: all other years before 1980 and after 2018.

Data Processing Description:

ERA5 For pre-processing, we retrieved hourly ERA5 data for 1980--2018 (inclusive) via the data lake at the National Energy Research Computing Center (NERSC) at 0.25 degree resolution on the lat-lon grid. The 3d atmospheric states are available on pressure levels. We then regridded this using bilinear interpolation to the HEALPix grid with Nside=256. For training the coarse-resolution models, we coarsened ERA5 by pooling to Nside=64 and transformed to zarr format.

ICON We obtained O(PB) of ICON data in zarr format from MPI-M. This dataset was stored on the HEALPix grid with Nside = 1024. Prior to our handling of the data it was interpolated using nearest neighbors from the native icosahedral grid. Five years of this data are available at 3-hour (3D) and 30-minute (2D) resolution in time. Conveniently, this data-set featured pre-coarsened data (again using averaging pooling) at our coarse-resolution of Nside = 64. We interpolated this coarse data to fixed pressure levels using linear interpolation in the vertical direction. To fill in values at pressure level locations that are below the surface, we use extrapolation based on hydrostatic balance constraints and an assumption of a constant temperature lapse rate of 6.5 K/km for temperature and geo-potential. For all other variables, we use constant extrapolation down from the surface. This procedure approximately reproduces ERA5’s undocumented procedure for filling-in below-surface levels.

Public Datasets:

  • ERA5
  • ICON
  • inputs4MIPs (tosbcs)

Training, Testing, and Evaluation Datasets:

The following datasets were used for Training, Testing, and Evaluation:

Link:

  • ERA5: https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5
  • ICON Cycle 3: https://www.wdc-climate.de/ui/entry?acronym=nextGEMS_cyc3
  • inputs4MIP: https://pcmdi.llnl.gov/mips/input4MIPs/

Data Collection Method by dataset

Automatic/sensors

Labeling Method by dataset

Automatic/sensors

Properties (Quantity, Dataset Descriptions, Sensor(s)):

ERA

ERA5 is a commonly used dataset that represent the best guess of the global atmospheric state at a coarse resolution of 25 km. Briefly, it is created by running a data assimilation algorithm that incorporates observations into the European Center for Medium Range Weather Forecasting's (ECMWF) numerical model. However, it is not available at a high enough resolution in space or time to resolve meso-scale motions on kilometer scale, and in particular deep convection. The ICON simulation fills in this gap since it is available at a global 5km resolution, but is not paired with reality. See Tab. \ref{tab:datasets} for a detailed overview of the datasets. We retreived data from 1980--2018.

ICON Cycle 3 The ICON data is a free-running simulation with the ICON atmospheric model coupled to a dynamic ocean and land. The atmospheric component solves the nonhydrostatic fluid mechanics equations on a global ico-sahedral mesh with a resolution of around 5 km. Unlike ERA5 the model explicitly resolves certain convective motions so a convection parameterization is not used. So even beyond the simple increase in resolution, we expect this to increase the fidelity of the precipitation and cloud fields in this dataset relative to ERA5 where these processes are parameterized. Furthermore, because the ocean and land are dynamically coupled to the atmosphere, we expect this run to obey conservation of heat, momentum and moisture between these different components, unlike in the ERA5 where the observations (rather than conservations) are king.

Testing Dataset:

Same details as Training Dataset. This period was reserved for tuning the model hyperparameters

  • ICON: 2020-01-20 03:00:00 through 2024-03-06 12:00:00 (inclusive)
  • ERA5/inputs4MIPs: Years 2018

Evaluation Dataset:

  • ICON: no independent data withheld for evaluation
  • ERA5/inputs4MIPS: all other years before 1980 and after 2018.

Inference:

Engine:

PyTorch

Test Hardware:

  • A100, H100

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.