cBottle
cBottle is an diffusion model that generates atmospheric states at kilometer resolution using a cascaded diffusion architecture.
This model is for research and development only.
Use of this model is governed by the NVIDIA Software and Model Evaluation License Agreement.
Global
Researchers and developers in the field of climate modeling and Earth system science would use this model to generate random images of the Earth's atmosphere, leveraging the techniques described in the paper 'Climate in a Bottle: A generative foundation model for the kilometer-scale atmosphere'.
NGC: 05/12/2025
Convolutional Neural Network (CNN)
Song-UNet on HEALPix geometry
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
PyTorch
['Linux']
3 Datasets:
ERA5 For pre-processing, we retrieved hourly ERA5 data for 1980--2018 (inclusive) via the data lake at the National Energy Research Computing Center (NERSC) at 0.25 degree resolution on the lat-lon grid. The 3d atmospheric states are available on pressure levels. We then regridded this using bilinear interpolation to the HEALPix grid with Nside=256. For training the coarse-resolution models, we coarsened ERA5 by pooling to Nside=64 and transformed to zarr format.
ICON We obtained O(PB) of ICON data in zarr format from MPI-M. This dataset was stored on the HEALPix grid with Nside = 1024. Prior to our handling of the data it was interpolated using nearest neighbors from the native icosahedral grid. Five years of this data are available at 3-hour (3D) and 30-minute (2D) resolution in time. Conveniently, this data-set featured pre-coarsened data (again using averaging pooling) at our coarse-resolution of Nside = 64. We interpolated this coarse data to fixed pressure levels using linear interpolation in the vertical direction. To fill in values at pressure level locations that are below the surface, we use extrapolation based on hydrostatic balance constraints and an assumption of a constant temperature lapse rate of 6.5 K/km for temperature and geo-potential. For all other variables, we use constant extrapolation down from the surface. This procedure approximately reproduces ERA5’s undocumented procedure for filling-in below-surface levels.
The following datasets were used for Training, Testing, and Evaluation:
Automatic/sensors
Automatic/sensors
ERA
ERA5 is a commonly used dataset that represent the best guess of the global atmospheric state at a coarse resolution of 25 km. Briefly, it is created by running a data assimilation algorithm that incorporates observations into the European Center for Medium Range Weather Forecasting's (ECMWF) numerical model. However, it is not available at a high enough resolution in space or time to resolve meso-scale motions on kilometer scale, and in particular deep convection. The ICON simulation fills in this gap since it is available at a global 5km resolution, but is not paired with reality. See Tab. \ref{tab:datasets} for a detailed overview of the datasets. We retreived data from 1980--2018.
ICON Cycle 3 The ICON data is a free-running simulation with the ICON atmospheric model coupled to a dynamic ocean and land. The atmospheric component solves the nonhydrostatic fluid mechanics equations on a global ico-sahedral mesh with a resolution of around 5 km. Unlike ERA5 the model explicitly resolves certain convective motions so a convection parameterization is not used. So even beyond the simple increase in resolution, we expect this to increase the fidelity of the precipitation and cloud fields in this dataset relative to ERA5 where these processes are parameterized. Furthermore, because the ocean and land are dynamically coupled to the atmosphere, we expect this run to obey conservation of heat, momentum and moisture between these different components, unlike in the ERA5 where the observations (rather than conservations) are king.
Same details as Training Dataset. This period was reserved for tuning the model hyperparameters
PyTorch
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns here.