Altering the core dimensions

Altering the core dimensions#

or more verbosely,

Scenario 3: Altering the core dimensions#

Sometimes the native xsnowDataset geometry (location, time, slope, realization, layer) is not the shape you want to analyze. This topic is about rewriting those core dimensions before and/or after you start adding new variables (scenario 1) or chaining extension decorators (scenario 2).

Typical motivations#

You want to aggregate single points into regions/spatial polygons.
You want to resample time (e.g., daily summary statistics, storm windows).
You want to simplify raw model layers into “packages of layers” with similar properties.
You want to combine different location geometries onto a common grid.

Short- vs. long-lived-and-recurrent dimension modifications#

Some operations are intentionally short-lived: aggregate points into regions, maybe add an elevation-band coordinate, grab several summary statistics, and continue with the detailed dataset. For that flow it makes sense to stay lightweight—attach labels or new coordinates (e.g., using xsnow.extensions.classification.classify or classify_spatially) and regroup with a couple lines of groupby/resample, without defining a new wrapper.

If you find yourself repeating the same reshape and even caching the result to file, consider formalizing it as a DatasetDecorator (scenario 2): lock in the new dimensions or coordinates, record the metadata that explains the transformation, and make your modifications predictable, sharable, and compatible with other decorator chains.

Tip

After reshaping, call xsnowDataset.reorder_dims(...) to keep the dataset’s dimension order tidy and predictable when printing, exploring, or writing to file.

Pattern: Aggregating statistics grouped by category#

An exact recipe for altering core dimensions is not really useful here—there is a lot of customization and context involved. A universal pattern, often called “split–apply–combine” or for this application more intuitive “categorize–group–aggregate”, is a good mental model:

categorize your data (e.g., via xsnow’s classification extension),
group by one or more categories (e.g., .groupby()),
aggregate/apply per group with the reduction you need.

One simple illustration with xsnowDataset:

import numpy as np
import xsnow
import xsnow.extensions.classification as xsclass

# 1) Create elevation bands as categories
xs = xsnow.sample_data.snp_gridded_ds().compute()
band_edges = np.array([1500, 1800, 2100])
xs_band = xs.classify(
    xsclass.classify_by_bins,  # "categorize"
    output_name="elev_band",
    output_kind="coord",
    input_var="altitude",
    attrs={
        "mapping": {
            0: "<1500 m",
            1: "1500–1800 m",
            2: "1800–2100 m",
            3: ">=2100 m",
        },
        "bin_edges": band_edges.tolist(),
    },
    bins=band_edges,
    right=True,
)

# 2) Group and aggregate (here: median by band and slope)
ds_stats = xs_band.groupby(["elev_band", "slope"]).median()  # "group --> aggregate"
xs_stats = xsnow.xsnowDataset(ds_stats)
xs_stats.sizes

[i] xsnow.xsnow_io: Using lazy loading with Dask

[i] xsnow.xsnow_io: Analyzing file sizes for 25 files...

[i] xsnow.xsnow_io: File size range: 223.9KB - 1620.6KB

[i] xsnow.xsnow_io: Scanning 10 largest files to determine max layer count...

[i] xsnow.xsnow_io: ✅ Smart max_layers determination complete:

[i] xsnow.xsnow_io:    - Checked: 25 file sizes

[i] xsnow.xsnow_io:    - Scanned: 10 largest files

[i] xsnow.xsnow_io:    - Max layers found: 33

[i] xsnow.xsnow_io:    - Using max_layers: 36 (with 10% buffer - partial scan)

[i] xsnow.xsnow_io:    - Time elapsed: 0.10 seconds

[i] xsnow.xsnow_io: Creating TRULY LAZY datasets using dask.delayed...

[i] xsnow.xsnow_io: Using max_layers dimension: 36

[i] xsnow.xsnow_io: Profile data will NOT be read until .compute() or .load() is called

[i] xsnow.xsnow_io: Building delayed loading graph for 25 files...

[i] xsnow.xsnow_io: ✅ Lazy structure created successfully!

[i] xsnow.xsnow_io:    - Files: 25

[i] xsnow.xsnow_io:    - Layer dimension: 36

[i] xsnow.xsnow_io:    - Data loaded: NONE (truly lazy!)

[i] xsnow.xsnow_io:    - Call .compute() or .load() to read files

[i] xsnow.xsnow_io: Computing delayed datasets sequentially...

[i] xsnow.xsnow_io: Created 25 datasets

Frozen({'elev_band': 3, 'slope': 5, 'time': 416, 'realization': 1, 'layer': 36})

The location dimension disappears because each point was categorized into a band and then aggregated; the grouped labels (elev_band, slope) now appear first in the dimension order.
If you want an ordering of the dimensions that is closer to the familiar base order:

xs_stats = xs_stats.reorder_dims(
    ("elev_band", "time", "slope", "realization", "layer"),
)
xs_stats.sizes

Frozen({'elev_band': 3, 'time': 416, 'slope': 5, 'realization': 1, 'layer': 36})

How this fits with the extension framework#

Scenario 1 methods still work on the rewritten dataset (e.g., compute new stability indices per region).
Scenario 2 decorators can be chained before or after you modify the core dimensions.
Ensure you still have an xsnowDataset when applying other extensions; operations like xarray’s .groupby() often return an xr.Dataset, which you may need to wrap back first (see example above).

The more you change the core dimensions, the more likely the resulting dataset becomes incompatible with other extension workflows—keep that in mind when designing reusable pipelines. And reach out if you think a specific constraint in a built-in extension should be improved.