How to explore netCDF datasets using xarray
By Deepnote team
•Updated on November 23, 2023
This tutorial offers a deep dive into handling netCDF datasets with xarray, demonstrating techniques for efficient data manipulation, subsetting, and visualization in Python.
This tutorial will guide you through the basics of using the xarray package to work with netCDF datasets. The xarray package simplifies the handling of multidimensional datasets in Python, capitalizing on the capabilities of pandas for handling labeled data and netCDF4 behind the scenes for reading data from files or services like ERDDAP. If you're transitioning from netCDF4
to xarray
, you'll find that xarray
provides a more high-level and Pythonic interface.
Overview
- xarray basics: Understanding the xarray package.
- Reading netCDF datasets: How to read netCDF datasets into xarray data structures.
- Exploring netCDF data: Discovering dataset dimensions, variables, and attributes.
- NumPy arrays and xarray: Manipulating netCDF variable data in NumPy array format.
Prerequisites
Before using xarray
, ensure you have it installed. If not, you can use the following command to install it along with its dependencies:
$ conda install xarray netCDF4 bottleneck
For Python versions earlier than 3.5, you might also need to install cyordereddict
for better performance—this is not needed as of Python 3.5 and later.
Getting started with xarray
To use xarray, let's import it alongside NumPy:
import numpy as np
import xarray as xr
Reading datasets with xarray
Use xr.open_dataset()
to load a netCDF dataset from a local file or a URL. For example:
ds = xr.open_dataset('https://salishsea.eos.ubc.ca/erddap/griddap/ubcSSnBathymetry2V1')
Or if you have the dataset locally:
lds = xr.open_dataset('../../NEMO-forcing/grid/bathy_meter_SalishSea2.nc')
Exploring the dataset structure
An xarray Dataset
is analogous to a dict of DataArray
objects with aligned dimensions—it's like an in-memory representation of a netCDF file:
print(ds)
You will find detailed metadata (attributes) of the dataset and its variables, including dimensions (dims
), data variables (data_vars
), and coordinates (coords
):
dims
: The names and lengths of dataset dimensions.data_vars
: Variables held in the dataset, accessible asDataArray
s.coords
: Labels for points in data variables, alsoDataArray
s.
For example, to see the dimensions:
ds.dims
To check the variables:
ds.data_vars
And to examine the coordinates:
ds.coords
Attributes: Metadata about your data
Both the dataset and variables have attributes, which are stored metadata describing the dataset. Let's look at the dataset's attributes:
ds.attrs
And the attributes of the longitude
DataArray:
ds.longitude
Data variables and NumPy arrays
Data variable values in xarray are stored as NumPy arrays. This means you can use NumPy's indexing and slicing to work with them.
For example, to access the latitudes and longitudes at the corners of the domain:
# Shape of the latitude variable
ds.latitude.shape
# Latitudes and longitudes at domain corners
print('Latitudes and longitudes of domain corners:')
print(' 0, 0: ', ds.latitude.values[0, 0], ds.longitude.values[0, 0])
print(' 0, x-max: ', ds.latitude.values[0, -1], ds.longitude.values[0, -1])
print(' y-max, 0: ', ds.latitude.values[-1, 0], ds.longitude.values[-1, 0])
print(' y-max, x-max:', ds.latitude.values[-1, -1], ds.longitude.values[-1, -1])
Slicing for subsets of data
You can use slicing to pull out specific subsets of data. For example:
# First two values in both dimensions
ds.longitude.values[:2, :2]
# Last two values in both dimensions
ds.latitude.values[-2:, -2:]
Conclusion
Xarray provides a powerful and convenient way to read, explore, and manipulate netCDF datasets. It uses an intuitive, pandas-like approach to work with labeled multidimensional data. With xarray, you can handle complex datasets in a more Pythonic, efficient manner.
Whether you're working on climate modeling, oceanographic data, or any other field that uses multidimensional datasets, learning xarray can significantly streamline your data analysis workflow. Happy data wrangling!