Combining datasets#

Before we start, think of profiles as the core units of our datasets:
One profile consists of all layers and all variables at one coordinate combination of the core dimensions (location, time, slope, realization).

You can choose to perform the following operations for “combining” datasets, which will be demonstrated in detail during this tutorial.

  1. xsnow.concat: Stack them!

    • Concatenate datasets along a single dimension that has no overlapping coordinates.

      • Separate locations or times

      • Different realizations

    • Profiles originate fully from one specific source.

    • Convenience method for special case:

      • .append_latest(): Append new timesteps from source files

  2. xsnow.combine: Choose one profile or the other!

    • Combine datasets with overlapping profiles while preferring profiles from one dataset.

      • Filling data gaps

      • Updating old datasets with new ones

      • Preferring one source on overlaps

      • Adding new coordinates (for non-overlapping coordinates, it behaves like xsnow.concat)

    • Never mixes sources or stratigraphies,
      the full profile always comes from one dataset and remains intact.

    • Convenience methods for special cases:

      • .stitch_gaps_with(): Fill data gaps conveniently without adding new coordinates.

      • .overwrite_with(): Overwrite existing dataset with new values and coordinates.

  3. xarray.merge: Add new variables!

    • Merging datasets with different data variables

      • Both datasets are equally valid, ideally variables are identical or mutually exclusive

      • Apply with conservative conflict resolution strategies to avoid silent unexpected behavior

    • Data in resulting profiles originates from different datasets!

    • Convenience method for special case:

      • .merge_smet(): Merge SMET files from source into an existing xsnowDataset.

  4. Combining stratigraphies: Edge case for experts only!

    • Concatenate or combine different layers that originate from different datasets

      • Apply xarray.merge with caution and heavy testing.

xsnow versus xarray semantics

xsnowDatasets have one peculiarity compared to “standard” xarray truly gridded datasets—snow layers exist on an irregular vertical grid. Therefore, xsnow provides its own functionality to safely concatenate and combine datasets by ensuring that profiles remain intact—individual layers won’t be added or removed, the core unit of a profile always stays intact.

This tutorial focuses on the xsnow functionality for combining datasets safely and only provides minimal background and guidance on using xarray-native functionality.

Broadcasting & alignment

When working with datasets of different shapes, xsnow makes use of xarray’s datasetalignment by coordinate labels and broadcasting to compatible shapes. This is powerful and convenient—but you need to be aware of what aligns with what. So, note that operations align on coordinate labels and not on array order. If labels do not match (e.g., differing time stamps), values are still paired by label and missing pairs become NaN.

import xsnow

1. Concatenating datasets along a single dimension#

If you want to combine multiple xsnowDatasets along a single dimension, use
xsnow.concat([...], dim=..., join=...) to stack them. Common use cases:

  • Locations: combine independent sites into a larger domain

  • Time: stack non-overlapping time stamps or different seasons

  • Realizations: stack several simulation variants

Note that there is a specific section on the special case “Updating datasets with recent data”. See table of contents on the right.

Concatenate by location#

To combine independent sites, you can concatenate along the location dimension. xsnow will then align the other dimensions (e.g., time, etc.). If a specific coordinate from another dimension, such as a timestamp, only exists in one of the two datasets, it will generate a NaN entry for the location with the missing timestamp.

# Read two datasets from independent sites
datapath = xsnow.sample_data.snp_gridded_dir()
xs1 = xsnow.read(f"{datapath}/pros/gridded/VIR1A.pro")
xs2 = xsnow.read(f"{datapath}/pros/gridded/VIR2A.pro")
print(xs1.sizes)
print(xs2.sizes)
Downloading file 'pros/gridded/VIR1A.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR1A.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR1A.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR1A.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR1A1.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR1A1.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR1A1.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR1A1.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR1A2.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR1A2.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR1A2.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR1A2.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR1A3.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR1A3.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR1A3.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR1A3.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR1A4.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR1A4.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR1A4.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR1A4.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR2A.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR2A.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR2A.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR2A.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR2A1.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR2A1.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR2A1.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR2A1.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR2A2.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR2A2.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR2A2.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR2A2.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR2A3.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR2A3.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR2A3.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR2A3.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR2A4.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR2A4.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR2A4.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR2A4.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR3A.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR3A.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR3A.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR3A.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR3A1.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR3A1.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR3A1.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR3A1.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR3A2.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR3A2.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR3A2.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR3A2.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR3A3.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR3A3.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR3A3.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR3A3.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR3A4.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR3A4.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR3A4.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR3A4.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR4A.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR4A.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR4A.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR4A.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR4A1.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR4A1.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR4A1.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR4A1.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR4A2.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR4A2.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR4A2.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR4A2.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR4A3.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR4A3.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR4A3.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR4A3.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR4A4.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR4A4.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR4A4.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR4A4.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR5A.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR5A.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR5A.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR5A.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR5A1.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR5A1.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR5A1.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR5A1.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR5A2.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR5A2.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR5A2.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR5A2.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR5A3.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR5A3.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR5A3.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR5A3.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR5A4.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR5A4.pro' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'pros/gridded/VIR5A4.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/gridded/VIR5A4.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'smets/gridded/forecast/VIR1A.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/smets/gridded/forecast/VIR1A.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'smets/gridded/forecast/VIR2A.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/smets/gridded/forecast/VIR2A.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'smets/gridded/forecast/VIR3A.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/smets/gridded/forecast/VIR3A.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'smets/gridded/forecast/VIR4A.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/smets/gridded/forecast/VIR4A.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'smets/gridded/forecast/VIR5A.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/smets/gridded/forecast/VIR5A.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'smets/gridded/nowcast/VIR1A.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/smets/gridded/nowcast/VIR1A.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'smets/gridded/nowcast/VIR2A.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/smets/gridded/nowcast/VIR2A.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'smets/gridded/nowcast/VIR3A.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/smets/gridded/nowcast/VIR3A.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'smets/gridded/nowcast/VIR4A.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/smets/gridded/nowcast/VIR4A.smet' to '/root/.cache/xsnow-snp-gridded'.
Downloading file 'smets/gridded/nowcast/VIR5A.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/smets/gridded/nowcast/VIR5A.smet' to '/root/.cache/xsnow-snp-gridded'.
[i] xsnow.xsnow_io: Loading 1 datasets eagerly with 1 workers...
[i] xsnow.xsnow_io: Loading 1 datasets eagerly with 1 workers...
Frozen({'location': 1, 'time': 416, 'slope': 1, 'realization': 1, 'layer': 12})
Frozen({'location': 1, 'time': 416, 'slope': 1, 'realization': 1, 'layer': 10})
# Concatenate VIR1A and VIR2A along the location dimension
ds_cat = xsnow.concat([xs1, xs2], dim="location")
xs_cat = xsnow.xsnowDataset(ds_cat)
print(xs_cat.sizes)
Frozen({'location': 2, 'time': 416, 'slope': 1, 'realization': 1, 'layer': 12})

As you see, the two different datasets xs1 and xs2 are not only from different locations, but because of that, they have slightly different numbers of timestamps and layers. xsnow.concat allows you to take deeper control over which variables to concatenate and how to handle potentially conflicting variables between datasets (e.g., duplicates). The example above uses an “outer join”, the union of all dataset coordinates. Check out the function documentation for more details or continue reading for more examples.

Lazy combinations

Note, that with the default parameters, xarray will load some coordinate variables into memory to compare them between datasets. This may be prohibitively expensive if you are manipulating your dataset lazily!

Concatenate by realization#

At this stage, xsnow’s read function does not know how to assign data points to different realizations. It will therefore place all data in one single realization, and it is up to the user to read multiple realizations separately and then combine them into a single xsnowDataset.

# Read two datasets: manual and simulated profiles
datapath = xsnow.sample_data.snp_snowobs_dir()
xs_sim = xsnow.read(f"{datapath}/pros/", recursive=True)
xs_obs = xsnow.read(f"{datapath}/pits/")
print(xs_sim.sizes)
print(xs_obs.sizes)
Downloading file 'pits/LawisProfile20908.caaml' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pits/LawisProfile20908.caaml' to '/root/.cache/xsnow-snp-snowobs'.
Downloading file 'pits/LawisProfile20908.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pits/LawisProfile20908.pro' to '/root/.cache/xsnow-snp-snowobs'.
Downloading file 'pits/LawisProfile20915.caaml' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pits/LawisProfile20915.caaml' to '/root/.cache/xsnow-snp-snowobs'.
Downloading file 'pits/LawisProfile20915.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pits/LawisProfile20915.pro' to '/root/.cache/xsnow-snp-snowobs'.
Downloading file 'pros/snowobs/LawisProfile20908.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/snowobs/LawisProfile20908.pro' to '/root/.cache/xsnow-snp-snowobs'.
Downloading file 'pros/snowobs/LawisProfile20908.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/snowobs/LawisProfile20908.smet' to '/root/.cache/xsnow-snp-snowobs'.
Downloading file 'pros/snowobs/LawisProfile20915.pro' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/snowobs/LawisProfile20915.pro' to '/root/.cache/xsnow-snp-snowobs'.
Downloading file 'pros/snowobs/LawisProfile20915.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/pros/snowobs/LawisProfile20915.smet' to '/root/.cache/xsnow-snp-snowobs'.
Downloading file 'smets/snowobs/forecast/LawisProfile20908.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/smets/snowobs/forecast/LawisProfile20908.smet' to '/root/.cache/xsnow-snp-snowobs'.
Downloading file 'smets/snowobs/forecast/LawisProfile20915.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/smets/snowobs/forecast/LawisProfile20915.smet' to '/root/.cache/xsnow-snp-snowobs'.
Downloading file 'smets/snowobs/nowcast/LawisProfile20908.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/smets/snowobs/nowcast/LawisProfile20908.smet' to '/root/.cache/xsnow-snp-snowobs'.
Downloading file 'smets/snowobs/nowcast/LawisProfile20915.smet' from 'https://gitlab.com/avacollabra/postprocessing/sample-data/-/raw/main/smets/snowobs/nowcast/LawisProfile20915.smet' to '/root/.cache/xsnow-snp-snowobs'.
[i] xsnow.xsnow_io: Loading 2 datasets eagerly with 1 workers...
[i] xsnow.utils: Slope coordinate 'inclination' varies by location. Preserving (location, slope) dimensions as allow_per_location=True.
[i] xsnow.utils: Slope coordinate 'azimuth' varies by location. Preserving (location, slope) dimensions as allow_per_location=True.
[i] xsnow.xsnow_io: Loading 2 datasets eagerly with 1 workers...
[i] xsnow.utils: Slope coordinate 'inclination' varies by location. Preserving (location, slope) dimensions as allow_per_location=True.
[i] xsnow.utils: Slope coordinate 'azimuth' varies by location. Preserving (location, slope) dimensions as allow_per_location=True.
Frozen({'location': 2, 'time': 381, 'slope': 1, 'realization': 1, 'layer': 59})
Frozen({'location': 2, 'slope': 1, 'time': 2, 'realization': 1, 'layer': 18})

The two datasets differ in the time and layer dimensions.

The following concatenation uses an outer join and creates specific labels for each of the realization coordinate values by providing a Pandas Index as dimension argument. This helps keeping the dimension readable and facilitates selections on the label later.

import pandas as pd
# Concat into two different (renamed) realizations
xs_cat = xsnow.concat(
    [xs_sim, xs_obs], 
    dim=pd.Index(["simulated", "observed"], name="realization"),
    join="outer",
)
print(xs_cat.sizes)
Frozen({'location': 2, 'time': 382, 'slope': 1, 'realization': 2, 'layer': 59})

Let’s also compute an inner join. This is where xsnow’s safety mechanism kicks in. For an inner join, xarray’s concat function would only return the 18 “common” layers, while we actually need all 59 layers in the concatenated dataset.

# Perform a differet join mode:
xs_cat = xsnow.concat(
    [xs_sim, xs_obs],
    dim=pd.Index(["simulated", "observed"], name="realization"),
    join="inner",
)
print(xs_cat.sizes)
Frozen({'location': 2, 'time': 1, 'slope': 1, 'realization': 2, 'layer': 59})

The inner join results in one overlapping time stamp, while all the layers were preserved.

2. Combining datasets with overlapping coordinates#

Use xsnow.combine([...], join=...) to combine overlapping datasets. They will first be aligned (and broadcast) according to the chosen join (e.g., ‘outer’, ‘left’, ‘inner’, etc.). At each overlapping coordinate, the function determines which dataset to choose and then takes all data points from all variables entirely from the left or right dataset, so that physical consistency at the profile-level is ensured. For that task, xsnow.combine chooses the leftmost dataset with a profile_status > 0, which represents valid data (possibly without snow) as opposed to unavailable data (profile_status == 0) or erroneous data (profile_status < 0). This mechanism ensures that for a given coordinate all variables and all layers always originate either fully from the left or fully from the right dataset.

The context that requires such an operation may be the one of an outdated dataset and an updated simulation. To create illustrative examples, let’s first build two small demo datasets:

  • old: ends earlier and contains one missing profile at noon.

  • new: overlaps with old, extends further in time, and modifies density values to mimic an updated simulation.

import numpy as np

xs = xsnow.single_profile_timeseries()

old = xs.isel(time=slice(0, 3)).copy(deep=True)
gap_time = old['time'].values[1]
old = old.where(old['time'] != gap_time)
old = old.assign_coords(z=old['z'].where(old['time'] != gap_time))

new = xs.isel(time=slice(1, 4)).copy(deep=True)
new["density"] = new["density"] + 100


print(f"old: {_format_time_summary(old)} (gap at {pd.to_datetime(gap_time).strftime('%H:%M')})")
print(f"new: {_format_time_summary(new)}")
[i] xsnow.xsnow_io: Loading 2 datasets eagerly with 1 workers...
[i] xsnow.utils: Slope coordinate 'inclination' varies by location. Preserving (location, slope) dimensions as allow_per_location=True.
[i] xsnow.utils: Slope coordinate 'azimuth' varies by location. Preserving (location, slope) dimensions as allow_per_location=True.
old: 3 timestamps 16:00--18:00 (gap at 17:00)
new: 3 timestamps 17:00--19:00

Let’s first combine old and new on the old domain and also preferring old on overlaps:

old_updated = xsnow.combine([old, new], join='left')

print(f"old_updated: {_format_time_summary(old_updated)}")
_print_profile_source(old, new, old_updated)
old_updated: 3 timestamps 16:00--18:00
Profile source after fill: ['old', 'new', 'old']

Let’s continue to keep the old domain but now prefer new on overlaps:

old_overwritten = xsnow.combine([new, old], join='right')

print(f"old_overwritten: {_format_time_summary(old_overwritten)}")
_print_profile_source(old, new, old_overwritten)
old_overwritten: 3 timestamps 16:00--18:00
Profile source after fill: ['old', 'new', 'new']

Other join modes can be set, and join modes can even be set on a core-dimension-basis, such as join_time. Consult the documentation for information about other arguments to set, such as compat or combine_attr, which allow configuring behavior for conflicting coordinates or attributes.

To make users lives even more convenient, xsnow implements two wrappers for common tasks, xsnowDataset.fill_gaps_with(...) (left join, prefer the existing dataset) and xsnowDataset.overwrite_with(...) (outer join, prefer the other dataset) that wrap xsnow.combine:

old_updated2 = old.fill_gaps_with(new)

print(f"old_updated2: {_format_time_summary(old_updated2)}")
_print_profile_source(old, new, old_updated2)
print(f"old_updated2 is identical to old_updated: {old_updated.identical(old_updated2)}")
old_updated2: 3 timestamps 16:00--18:00
Profile source after fill: ['old', 'new', 'old']
old_updated2 is identical to old_updated: True
old_overwritten2 = old.overwrite_with(new)

print(f"old_overwritten2: {_format_time_summary(old_overwritten2)}")
_print_profile_source(old, new, old_overwritten2)
old_overwritten2: 4 timestamps 16:00--19:00
Profile source after fill: ['old', 'new', 'new', 'new']

As we see, old_overwritten2 is different from old_overwritten in that it represents an outer join—in this case resulting in the new, fourth timestamp being added.

3. Merging new variables or coordinates into an xsnowDataset#

Now imagine you have two datasets, this time not with the notion of old and new but equally valid with different meteo and snow layer variables. It could well be that both datasets have valid profiles for a given coordinate and that you want to mix the variables from the two datasets. xsnow.combine is great at merging new coordinates as we saw already, but due to the semantics of keeping profiles intact and not mixing variables from different datasets, it is not an ideal tool for merging new data variables into the combined dataset. You would end up with a NaN value in the new variable from right wherever the left dataset has a valid profile. Luckily, this is what xarray.merge is designed for.

In contrast to xsnow.combine, xarray.merge will take all data variables from the left dataset and include any new variables that the right dataset has. To ensure that the layer coordinates are really consistent between the two datasets (e.g., layer x is at the correct height/depth z), we recommend to specifically set safe conflict rules that make the operation raise exceptions early, so you won’t suffer silent surprises (see example below).

xarray.merge does not fill data gaps

Note, that xarray.merge is meant to add new variables (or coordinates). It does not fill data gaps! While xarray does offer gap filling functionality, we strongly advise against using xarray’s tools for that and offer xsnow.combine with its convenience wrapper as demonstrated in the previous section.

In the following example, note that I extract the underlying xarray.Dataset from the xsnowDataset when calling xarray.merge (It doesn’t know our data class!).

import xarray as xr

# Two overlapping products with different variables
# Product A: keep density for the earliest timestamps
xs_A = xs.isel(time=slice(0, 3)).copy(deep=True)

# Product B: overlapping time range, but a different density model
xs_B = xs.isel(time=slice(1, 4)).copy(deep=True)
xs_B["density_B"] = xs_B["density"] + 100

xs_B_conflict = xs_B.copy(deep=True)
xs_B_conflict["z"] = xs_B_conflict["z"] - 2  #  <-- 2 cm layer offset
# Strict merge will raise because both datasets provide 
# the variable 'density' with different values on the overlap
try:
    xr.merge([xs_A.to_xarray(), xs_B_conflict.to_xarray()],
             join='inner',
             compat="no_conflicts", combine_attrs="no_conflicts")
except Exception as exc:
    print("Conflict caught (as intended):", type(exc).__name__)
Conflict caught (as intended): MergeError
# Rename the denisty variable in xs_B so both variables can coexist and merge safely
merged = xr.merge([xs_A.to_xarray(), xs_B.to_xarray()],
                  join='inner',
                  compat="no_conflicts", combine_attrs="no_conflicts")

merged = merged.drop_vars([var for var in merged.data_vars if \
    not var in ["density", "density_B"]])

print(f"merged: {_format_time_summary(merged)}")
print("  contains ", list(merged.data_vars))
overlap_time = merged.time.values[1]
print("Overlap time now carries both sources:")
print(f"  non-null values in density:   {
    merged['density'].sel(time=overlap_time).notnull().any().values}")
print(f"  non-null values in density_B: {
    merged['density_B'].sel(time=overlap_time).notnull().any().values}")
    
merged: 2 timestamps 17:00--18:00
  contains  ['density', 'density_B']
Overlap time now carries both sources:
  non-null values in density:   True
  non-null values in density_B: True

Note, that the final object merged is an xarray.Dataset. You could easily convert it back (without cost) via xs_merged = xsnow.xsnoeDataset(merged).

Merging of new non-layer variables#

Merging of new meteorological or other scalar variables into an existing xsnowDataset is primarily done by xarray.merge, analogous to the previous example. Since this is such a common case, however, we implemented the merge_smet() method that you got to know in Reading and writing, Adding SMET files…. merge_smet() is convenient because you can merge upon reading from source files directly, and you have control over which realization to merge into.

For completeness, we also show an example for this task, without relying on merge_smet(). Let’s read stratigraphy and meteorological data into two separate datasets, then compute a moving average of air temperature and merge the meteorological dataset back into the stratigraphy dataset:

# Read smet and pro into separate datasets
datapath = xsnow.sample_data.snp_gridded_dir()
xs = xsnow.read(f"{datapath}/pros/gridded/VIR1A.pro")
xs_smet = xsnow.read(f"{datapath}/pros/gridded/VIR1A.smet")  # for demo we don't use merge_smet here

# Create a moving average for a smet variable
xs_smet['TA_ma'] = xs_smet['TA'].rolling(time=6, min_periods=1).mean()

# Combine both datasets
ds_merged = xr.merge([xs.to_xarray(),
                      xs_smet.data.drop_vars("profile_status")],
                     join='left', compat="no_conflicts", combine_attrs="no_conflicts")
xs_merged = xsnow.xsnowDataset(ds_merged)

# Brief sanity check
print(f"'TA_ma' in xs_merged: {'TA_ma' in xs_merged}")
ta_smet, ta_merged = xr.align(xs_smet['TA_ma'], xs_merged['TA_ma'], join="inner")
try:
    xr.testing.assert_allclose(ta_smet, ta_merged)
    values_as_expected = True
except ValueError:
    values_as_expected = False
print(f"Values as expected: {values_as_expected}")
[i] xsnow.xsnow_io: Loading 1 datasets eagerly with 1 workers...
'TA_ma' in xs_merged: True
Values as expected: True

As in the previous example, xarray.merge accepts xarray.Datasets, which we can easily convert back after the merge. Also note, that I used two different ways to access the underlying xarray.Dataset (.to_xarray() as method, or .data as attribute). I also dropped the profile_status variable from the meteo dataset since it would raise a conflict exception and I want to keep the status from the stratigraphy dataset anyways.

4. Updating xsnowDatasets with recent data#

Most functionality related to updating xsnowDatasets with recent data has already been explained. Since it represents a common operation, we use this section to summarize the different options and introduce one new method.

  1. Convenient, most powerful, but compute-intense: xsnow.combine or the convenience methods .stitch_gaps_with() or .overwrite_with().

  2. Convenient, still versatile, computationally cheaper: the convenience method .append_latest(), which wraps xsnow.concat.

When you want to update an existing dataset with recent data, you should ask yourself whether you can pick one global timestamp from which onward you accept discarding the old data and accepting the new data. If so, concatenating the old dataset (prior to the cutoff time) and the new dataset (starting with the cutoff time) will be your cheapest approach to getting an updated dataset. If, however, you need to update your dataset more subtly, such as different locations may have different timestamps that you want to keep versus update, then you should choose to xsnow.combine your datasets.

xsnow.combine and its convenience methods can be applied as demonstrated earlier. .append_latest() is a wrapper for reading new data from source files and then concatenating them with the existing old dataset. It could be applied like new = old.append_latest('path/to/new/files') in which case it would read only those timestamps older than the ones in old. If you want to update from an earlier timestamp, you can provide that earlier timestamp as argument. Check out the function documentation for specifics. Instead of using the convenience method, you can also manually assemble an equivalent procedure, for example if both datasets have been read already:

old_part1 = old.isel(time=slice(0, 2))
new_part2 = new.isel(time=new.time > old_part1.time.max())

old_updated3 = xsnow.concat([old_part1, new_part2], dim='time')

print(f"old_updated3: {_format_time_summary(old_updated3)}")
_print_profile_source(old, new, old_updated3)
old_updated3: 4 timestamps 16:00--19:00
Profile source after fill: ['old', 'old', 'new', 'new']