Reading and writing#

or more explicitly:

Reading and writing datasets and metadata#

This tutorial shows how xsnow handles data I/O and metadata:

  1. Read datasets

  2. Inspect dimensions, coordinates, and variables

  3. View and edit metadata

  4. Write datasets to file

xsnow focuses on parsing SNOWPACK (PRO/SMET) and CROCUS (NetCDF) and will provide basic readers for CAAML/JSON. It also supports an efficient xsnow NetCDF format for caching large datasets.

  • SNOWPACK: native PRO and SMET

  • CROCUS: NetCDF (planned)

  • CAAML: XML/JSON (planned)

  • xsnow NetCDF: optimized for xsnow’s data model xsnowDataset

Sample datasets

xsnow ships several sample datasets. Two lightweight ones are available via xsnow.single_profile() and xsnow.single_profile_timeseries(). More datasets live in xsnow.sample_data (see API docs).

Quick overview over relevant functions

Task

Function/Method

Read any file into xsnowDataset

read()

Write xsnowDataset to file

.to_netcdf(), .to_pro(), to_smet()

Append new timesteps from files

.append_latest()

Merge SMET files

.merge_smet()

1. Reading example SNOWPACK datasets#

For this SNOWPACK section, we use a sample dataset of gridded simulations shipped in native SNOWPACK formats. Get the sample data path:

import xsnow
datapath = xsnow.sample_data.snp_gridded_dir()

The function returns a directory path. Its structure looks like:

Data location: /root/.cache/xsnow-snp-gridded
xsnow-snp-gridded/
    smets/
        gridded/
            nowcast/
                VIR1A.smet
                VIR2A.smet
                VIR3A.smet
                VIR4A.smet
                VIR5A.smet
            forecast/
                VIR1A.smet
                VIR2A.smet
                VIR3A.smet
                VIR4A.smet
                VIR5A.smet
    pros/
        gridded/
            VIR1A.pro
            VIR1A.smet
            VIR1A1.pro
            ...
            VIR5A4.pro
            VIR5A4.smet

Reading a single file#

Read a single PRO file:

xs = xsnow.read(f"{datapath}/pros/gridded/VIR1A.pro")
print(xs)
[i] xsnow.xsnow_io: Loading 1 datasets eagerly with 1 workers...
<xsnowDataset>
  Locations:  1
  Timestamps: 416 (2024-01-16--2024-02-02)
  Profiles:   416 total | 416 valid | 373 with HS>0

    employing the <xarray.Dataset> Size: 545kB
    Dimensions:                   (location: 1, time: 416, slope: 1,
                                   realization: 1, layer: 12)
    Coordinates:
      * location                  (location) <U5 20B 'VIR1A'
      * time                      (time) datetime64[ns] 3kB 2024-01-16T05:00:00 ....
      * slope                     (slope) int64 8B 0
      * realization               (realization) int64 8B 0
      * layer                     (layer) int64 96B 0 1 2 3 4 5 6 7 8 9 10 11
        altitude                  (location) float64 8B 2.372e+03
        latitude                  (location) float64 8B 47.15
        longitude                 (location) float64 8B 11.19
        azimuth                   (slope) int64 8B 0
        inclination               (slope) int64 8B 0
        z                         (location, time, slope, realization, layer) float32 20kB ...
    Data variables: (12/28)
        air_content               (location, time, slope, realization, layer) float32 20kB ...
        bond_size                 (location, time, slope, realization, layer) float32 20kB ...
        coord_number              (location, time, slope, realization, layer) float32 20kB ...
        dendricity                (location, time, slope, realization, layer) float32 20kB ...
        density                   (location, time, slope, realization, layer) float32 20kB ...
        element_ID                (location, time, slope, realization, layer) float32 20kB ...
        ...                        ...
        stress                    (location, time, slope, realization, layer) float32 20kB ...
        temperature               (location, time, slope, realization, layer) float32 20kB ...
        temperature_gradient      (location, time, slope, realization, layer) float32 20kB ...
        viscosity                 (location, time, slope, realization, layer) float32 20kB ...
        viscous_deformation_rate  (location, time, slope, realization, layer) float32 20kB ...
        HS                        (location, time, slope, realization) float32 2kB ...
    Attributes:
        Conventions:  CF-1.8
        crs:          EPSG:4326

The summary shows 1 location and 373 timestamps. xsnowDataset wraps an underlying xarray.Dataset with additional dimensions (slope, realization, layer).

Read a single SMET file (no layer dimension):

# Read a single smet file
xs = xsnow.read(f"{datapath}/pros/gridded/VIR1A.smet")
print(xs)
<xsnowDataset>
  Locations:  1
  Timestamps: 416 (2024-01-16--2024-02-02)
  Profiles:   416 total | 0 valid | unavailable with HS>0

    employing the <xarray.Dataset> Size: 213kB
    Dimensions:               (location: 1, time: 416, slope: 1, realization: 1)
    Coordinates:
      * location              (location) object 8B 'VIR1A'
      * time                  (time) datetime64[ns] 3kB 2024-01-16T05:00:00 ... 2...
      * slope                 (slope) int64 8B 0
      * realization           (realization) int64 8B 0
        altitude              (location) float64 8B 2.372e+03
        latitude              (location) float64 8B 47.15
        longitude             (location) float64 8B 11.19
        azimuth               (slope) float64 8B 0.0
        inclination           (slope) float64 8B 0.0
    Data variables: (12/64)
        ColdContentSnow       (location, time, slope, realization) float64 3kB 0....
        DW                    (location, time, slope, realization) float64 3kB 29...
        HN12                  (location, time, slope, realization) float64 3kB 0....
        HN24                  (location, time, slope, realization) float64 3kB 0....
        HN3                   (location, time, slope, realization) float64 3kB 0....
        HN6                   (location, time, slope, realization) float64 3kB 0....
        ...                    ...
        zS4                   (location, time, slope, realization) float64 3kB 0....
        zS5                   (location, time, slope, realization) float64 3kB 0....
        zSd                   (location, time, slope, realization) float64 3kB 0....
        zSn                   (location, time, slope, realization) float64 3kB 0....
        zSs                   (location, time, slope, realization) float64 3kB 0....
        profile_status        (location, time, slope, realization) int8 416B 0 ... 0
    Attributes:
        Conventions:  CF-1.8
        crs:          EPSG:4326

Reading multiple files#

Read multiple files by (1) passing a list of paths or (2) pointing to a directory. Files are merged on shared coordinates (e.g., location, time). Note that for parsing SNOWPACK formats, the merging is based on identical StationNames and station_names (properties of PRO and SMET files). Therefore, please don’t use the same station names for different locations.

# Reading a list of filepaths
xs = xsnow.read([
    f"{datapath}/pros/gridded/VIR1A.pro",            # profile data
    f"{datapath}/pros/gridded/VIR1A.smet",           # scalar output data
    f"{datapath}/smets/gridded/nowcast/VIR1A.smet",  # weather input data
])
[i] xsnow.xsnow_io: Loading 1 datasets eagerly with 1 workers...
# Reading all files within one directory
xs = xsnow.read(f"{datapath}/pros/gridded/")
[i] xsnow.xsnow_io: Using lazy loading with Dask
[i] xsnow.xsnow_io: Creating lazy datasets backed by dask arrays...
[i] xsnow.xsnow_io: Data will NOT be computed until explicitly requested by user
[i] xsnow.xsnow_io: Created 25 lazy datasets (data NOT yet loaded into memory)
print(xs)
<xsnowDataset>
  Locations:  5
  Timestamps: 416 (2024-01-16--2024-02-02)
  Profiles:   10400 total | 10400 valid | 8861 with HS>0

    employing the <xarray.Dataset> Size: 42MB
    Dimensions:                   (location: 5, time: 416, slope: 5,
                                   realization: 1, layer: 33)
    Coordinates:
      * location                  (location) <U5 100B 'VIR1A' 'VIR2A' ... 'VIR5A'
      * time                      (time) datetime64[ns] 3kB 2024-01-16T05:00:00 ....
      * slope                     (slope) int64 40B 0 1 2 3 4
      * realization               (realization) int64 8B 0
      * layer                     (layer) int64 264B 0 1 2 3 4 5 ... 28 29 30 31 32
        altitude                  (location) float64 40B 2.372e+03 ... 2.066e+03
        latitude                  (location) float64 40B 47.15 47.44 ... 47.44 47.37
        longitude                 (location) float64 40B 11.19 11.29 11.3 11.39 11.5
        azimuth                   (slope) int64 40B 0 0 90 180 270
        inclination               (slope) int64 40B 0 38 38 38 38
        z                         (location, time, slope, realization, layer) float32 1MB dask.array<chunksize=(1, 100, 1, 1, 33), meta=np.ndarray>
    Data variables: (12/91)
        ColdContentSnow           (location, time, slope, realization) float64 83kB ...
        DW                        (location, time, slope, realization) float64 83kB ...
        HN12                      (location, time, slope, realization) float64 83kB ...
        HN24                      (location, time, slope, realization) float64 83kB ...
        HN3                       (location, time, slope, realization) float64 83kB ...
        HN6                       (location, time, slope, realization) float64 83kB ...
        ...                        ...
        zS4                       (location, time, slope, realization) float64 83kB ...
        zS5                       (location, time, slope, realization) float64 83kB ...
        zSd                       (location, time, slope, realization) float64 83kB ...
        zSn                       (location, time, slope, realization) float64 83kB ...
        zSs                       (location, time, slope, realization) float64 83kB ...
        HS                        (location, time, slope, realization) float32 42kB dask.array<chunksize=(1, 100, 1, 1), meta=np.ndarray>
    Attributes:
        Conventions:  CF-1.8
        crs:          EPSG:4326

Adding SMET files to an existing dataset#

Add .smet files to your xsnowDataset after reading it with xsnowDataset.merge_smet(). You can pass a single file, a list of files, or a directory.

xs = xsnow.read(f"{datapath}/pros/gridded/VIR1A.pro")
xs = xs.merge_smet(f"{datapath}/pros/gridded/VIR1A.smet")
[i] xsnow.xsnow_io: Loading 1 datasets eagerly with 1 workers...

Tip

Use var_prefix to avoid name collisions (e.g., input SMET variables as input_TA, input_PSUM, etc.). This keeps data organized and prevents overwriting variables with identical names across input/output SMETs.

Appending new timesteps from updated source files#

If your simulations run operationally during a season on a day-by-day basis, you may want to append new times to the existing dataset. It is much more efficient to cache the dataset in a binary format (e.g., NetCDF) and only read new timesteps from the PRO and SMET files. The method xsnowDataset.append_latest() allows you to do that conveniently. Make sure you also read the dedicated section on Updating xsnowDatasets with recent data in the Combining datasets tutorial.

nowcast = xsnow.read(f"{datapath}/smets/gridded/nowcast")
nowcast_forecast = nowcast.append_latest(f"{datapath}/smets/gridded/forecast")
[i] xsnow.xsnow_io: Appended 58 new timestamps: 2024-01-31T03:00:00--2024-02-02T12:00:00

2. Lazy versus eager loading#

When reading large datasets into the xsnow gridded data structure, it’s important to understand the difference between lazy and eager loading — and how this affects memory usage.

  • Eager loading means: as soon as you call the read function, the entire dataset is loaded into memory (i.e., into a concrete xr.Dataset). Further operations are applied on that in-memory object.

  • Lazy loading means: the read function returns a deferred object (or a data structure with deferred chunks) that doesn’t immediately load all data. You can continue chaining selections/transformations, and only when you trigger a final compute will the data actually be read from disk and loaded into memory.

For more information, see Handling large datasets.

3. Subsetting data while reading#

As a convenience, the dataset can already be subset during the call to read(). The entire dataset is being parsed first, and then subset to the desired coordinates. This is particularly powerful if you load your data lazily first—“load only what you need”. For example:

subset = xsnow.read(f"{datapath}/pros/gridded/", 
                location=['VIR1A', 'VIR3A'],
                slope={'inclination': 38, 'azimuth': 180},
                time=slice('2024-01-16T10:00', '2024-01-16T12:00'),
                lazy=True)
[i] xsnow.xsnow_io: Using lazy loading with Dask
[i] xsnow.xsnow_io: Creating lazy datasets backed by dask arrays...
[i] xsnow.xsnow_io: Data will NOT be computed until explicitly requested by user
[i] xsnow.xsnow_io: Created 25 lazy datasets (data NOT yet loaded into memory)

4. Inspecting structure and metadata#

You can use standard xarray accessors to explore shape and metadata:

# List dimensions and sizes
xs.sizes
Frozen({'location': 1, 'time': 416, 'slope': 1, 'realization': 1, 'layer': 12})
# List coordinates and their info
xs.coords
Coordinates:
  * location     (location) <U5 20B 'VIR1A'
  * time         (time) datetime64[ns] 3kB 2024-01-16T05:00:00 ... 2024-02-02...
  * slope        (slope) int64 8B 0
  * realization  (realization) int64 8B 0
  * layer        (layer) int64 96B 0 1 2 3 4 5 6 7 8 9 10 11
    altitude     (location) float64 8B 2.372e+03
    latitude     (location) float64 8B 47.15
    longitude    (location) float64 8B 11.19
    azimuth      (slope) float64 8B 0.0
    inclination  (slope) float64 8B 0.0
    z            (location, time, slope, realization, layer) float32 20kB 0.0...
# List all data variables
xs.data_vars
Data variables:
    air_content               (location, time, slope, realization, layer) float32 20kB ...
    bond_size                 (location, time, slope, realization, layer) float32 20kB ...
    coord_number              (location, time, slope, realization, layer) float32 20kB ...
    dendricity                (location, time, slope, realization, layer) float32 20kB ...
    density                   (location, time, slope, realization, layer) float32 20kB ...
    element_ID                (location, time, slope, realization, layer) float32 20kB ...
    grain_size                (location, time, slope, realization, layer) float32 20kB ...
    grain_type                (location, time, slope, realization, layer) float32 20kB ...
    gs_difference             (location, time, slope, realization, layer) float32 20kB ...
    hand_hardness             (location, time, slope, realization, layer) float32 20kB ...
    hardness_difference       (location, time, slope, realization, layer) float32 20kB ...
    height                    (location, time, slope, realization, layer) float32 20kB ...
    ice_content               (location, time, slope, realization, layer) float32 20kB ...
    lwc                       (location, time, slope, realization, layer) float32 20kB ...
    opt_equ_grain_size        (location, time, slope, realization, layer) float32 20kB ...
    profile_status            (location, time, slope, realization) float64 3kB ...
    shear_strength            (location, time, slope, realization, layer) float32 20kB ...
    sk38                      (location, time, slope, realization, layer) float32 20kB ...
    sn38                      (location, time, slope, realization, layer) float32 20kB ...
    sphericity                (location, time, slope, realization, layer) float32 20kB ...
    ssi                       (location, time, slope, realization, layer) float32 20kB ...
    stab_deformation_rate     (location, time, slope, realization, layer) float32 20kB ...
    stress                    (location, time, slope, realization, layer) float32 20kB ...
    temperature               (location, time, slope, realization, layer) float32 20kB ...
    temperature_gradient      (location, time, slope, realization, layer) float32 20kB ...
    viscosity                 (location, time, slope, realization, layer) float32 20kB ...
    viscous_deformation_rate  (location, time, slope, realization, layer) float32 20kB ...
    HS                        (location, time, slope, realization) float32 2kB ...
    ColdContentSnow           (location, time, slope, realization) float64 3kB ...
    DW                        (location, time, slope, realization) float64 3kB ...
    HN12                      (location, time, slope, realization) float64 3kB ...
    HN24                      (location, time, slope, realization) float64 3kB ...
    HN3                       (location, time, slope, realization) float64 3kB ...
    HN6                       (location, time, slope, realization) float64 3kB ...
    HN72_24                   (location, time, slope, realization) float64 3kB ...
    HS_meas                   (location, time, slope, realization) float64 3kB ...
    HS_mod                    (location, time, slope, realization) float64 3kB ...
    ILWR                      (location, time, slope, realization) float64 3kB ...
    ISWR                      (location, time, slope, realization) float64 3kB ...
    ISWR_diff                 (location, time, slope, realization) float64 3kB ...
    ISWR_dir                  (location, time, slope, realization) float64 3kB ...
    ISWR_h                    (location, time, slope, realization) float64 3kB ...
    LWR_net                   (location, time, slope, realization) float64 3kB ...
    MS_Evap                   (location, time, slope, realization) float64 3kB ...
    MS_Ice_Soil               (location, time, slope, realization) float64 3kB ...
    MS_Rain                   (location, time, slope, realization) float64 3kB ...
    MS_SN_Runoff              (location, time, slope, realization) float64 3kB ...
    MS_Snow                   (location, time, slope, realization) float64 3kB ...
    MS_Soil_Runoff            (location, time, slope, realization) float64 3kB ...
    MS_Sublimation            (location, time, slope, realization) float64 3kB ...
    MS_Surface_Mass_Flux      (location, time, slope, realization) float64 3kB ...
    MS_Water                  (location, time, slope, realization) float64 3kB ...
    MS_Water_Soil             (location, time, slope, realization) float64 3kB ...
    MS_Wind                   (location, time, slope, realization) float64 3kB ...
    OLWR                      (location, time, slope, realization) float64 3kB ...
    OSWR                      (location, time, slope, realization) float64 3kB ...
    PSUM24                    (location, time, slope, realization) float64 3kB ...
    Qg                        (location, time, slope, realization) float64 3kB ...
    Qg0                       (location, time, slope, realization) float64 3kB ...
    Ql                        (location, time, slope, realization) float64 3kB ...
    Qr                        (location, time, slope, realization) float64 3kB ...
    Qs                        (location, time, slope, realization) float64 3kB ...
    Qw                        (location, time, slope, realization) float64 3kB ...
    RH                        (location, time, slope, realization) float64 3kB ...
    S4                        (location, time, slope, realization) float64 3kB ...
    S5                        (location, time, slope, realization) float64 3kB ...
    SWE                       (location, time, slope, realization) float64 3kB ...
    Sclass1                   (location, time, slope, realization) float64 3kB ...
    Sclass2                   (location, time, slope, realization) float64 3kB ...
    Sd                        (location, time, slope, realization) float64 3kB ...
    Sn                        (location, time, slope, realization) float64 3kB ...
    Ss                        (location, time, slope, realization) float64 3kB ...
    TA                        (location, time, slope, realization) float64 3kB ...
    TSG                       (location, time, slope, realization) float64 3kB ...
    TSS_meas                  (location, time, slope, realization) float64 3kB ...
    TSS_mod                   (location, time, slope, realization) float64 3kB ...
    T_bottom                  (location, time, slope, realization) float64 3kB ...
    VW                        (location, time, slope, realization) float64 3kB ...
    VW_drift                  (location, time, slope, realization) float64 3kB ...
    dIntEnergySnow            (location, time, slope, realization) float64 3kB ...
    hoar_size                 (location, time, slope, realization) float64 3kB ...
    mAlbedo                   (location, time, slope, realization) float64 3kB ...
    meltFreezeEnergySnow      (location, time, slope, realization) float64 3kB ...
    pAlbedo                   (location, time, slope, realization) float64 3kB ...
    ski_pen                   (location, time, slope, realization) float64 3kB ...
    wind_trans24              (location, time, slope, realization) float64 3kB ...
    zS4                       (location, time, slope, realization) float64 3kB ...
    zS5                       (location, time, slope, realization) float64 3kB ...
    zSd                       (location, time, slope, realization) float64 3kB ...
    zSn                       (location, time, slope, realization) float64 3kB ...
    zSs                       (location, time, slope, realization) float64 3kB ...

Inspect individual variables (e.g., snow density):

print(xs['density'])
<xarray.DataArray 'density' (location: 1, time: 416, slope: 1, realization: 1,
                             layer: 12)> Size: 20kB
array([[[[[  nan,   nan,   nan, ...,   nan,   nan,   nan]]],


        [[[  nan,   nan,   nan, ...,   nan,   nan,   nan]]],


        [[[  nan,   nan,   nan, ...,   nan,   nan,   nan]]],


        ...,


        [[[213.9, 209. , 223.7, ..., 171.9, 142.4, 124.9]]],


        [[[214. , 209.2, 223.7, ..., 172.6, 143. , 125. ]]],


        [[[214.2, 209.3, 223.8, ..., 173.3, 143.6, 125. ]]]]],
      shape=(1, 416, 1, 1, 12), dtype=float32)
Coordinates:
  * location     (location) <U5 20B 'VIR1A'
  * time         (time) datetime64[ns] 3kB 2024-01-16T05:00:00 ... 2024-02-02...
  * slope        (slope) int64 8B 0
  * realization  (realization) int64 8B 0
  * layer        (layer) int64 96B 0 1 2 3 4 5 6 7 8 9 10 11
    altitude     (location) float64 8B 2.372e+03
    latitude     (location) float64 8B 47.15
    longitude    (location) float64 8B 11.19
    azimuth      (slope) float64 8B 0.0
    inclination  (slope) float64 8B 0.0
    z            (location, time, slope, realization, layer) float32 20kB 0.0...
Attributes:
    standard_name:  mass_density_of_snow
    units:          kg m-3
    long_name:      mass density of snow

Each variable carries attributes (metadata) like units and descriptions.

View dataset- and variable-level attributes:

# dataset attributes
xs.attrs
{'Conventions': 'CF-1.8', 'crs': 'EPSG:4326'}
# variable-level attributes
xs['density'].attrs
{'standard_name': 'mass_density_of_snow',
 'units': 'kg m-3',
 'long_name': 'mass density of snow'}

5. Adding or modifying metadata#

Metadata are stored under .attrs. You can freely edit or add entries.

For example, add a comment:

xs.attrs['comments'] = 'You can document your thoughts!'
xs.attrs
{'Conventions': 'CF-1.8',
 'crs': 'EPSG:4326',
 'comments': 'You can document your thoughts!'}

Modify variable-level attributes, e.g., change units or add remarks:

xs['density'].attrs['comment'] = "What's the density of ice cream?"
xs['density'].attrs
{'standard_name': 'mass_density_of_snow',
 'units': 'kg m-3',
 'long_name': 'mass density of snow',
 'comment': "What's the density of ice cream?"}

You could record processing steps in a history attribute:

from datetime import datetime
history_line = f"{datetime.now().isoformat(timespec='seconds')}: modified metadata interactively"
xs.attrs['history'] = xs.attrs.get('history', '') + '\n' + history_line
print(xs.attrs['history'])
2025-12-16T13:58:55: modified metadata interactively

6. Writing datasets to file#

You can save your xsnowDataset back to file(s). To learn about supported formats, explore the to_*() methods in the xsnow_io module API docs. For example,

  • to_netcdf(): An almost exact dump of the xsnowDataset that can be parsed again by read().

  • to_pro() and to_smet(): For every (location, time, slope, realization) you can write one .smet and .pro file.

First, create a temporary directory to write to:

import tempfile
tmpdir = tempfile.mkdtemp(prefix="xsnow-demo-")

Writing to .pro / .smet#

# Write a single profile timeseries to .pro/.smet
xs.to_pro(f"{tmpdir}/test.pro")
xs.to_smet(f"{tmpdir}/test.smet")

Dumping the entire dataset to NetCDF#

xs.to_netcdf(f"{tmpdir}/test.nc")
[i] xsnow.xsnow_io: Dataset successfully saved to /tmp/xsnow-demo-us1kehrs/test.nc