{ "cells": [ { "cell_type": "markdown", "id": "108f4a89", "metadata": {}, "source": [ "# Altering the core dimensions\n", "or more verbosely,\n", "# Scenario 3: Altering the core dimensions \n", "\n", "Sometimes the native `xsnowDataset` geometry (location, time, slope, realization, layer) is not the shape you want to analyze. This topic is about **rewriting those core dimensions** *before* and/or *after* you start adding new variables (scenario 1) or chaining extension decorators (scenario 2).\n", "\n", "## Typical motivations\n", "- You want to aggregate single points into regions/spatial polygons.\n", "- You want to resample time (e.g., daily summary statistics, storm windows).\n", "- You want to simplify raw model layers into \"packages of layers\" with similar properties.\n", "- You want to combine different location geometries onto a common grid.\n", "\n", "## Short- vs. long-lived-and-recurrent dimension modifications\n", "Some operations are intentionally short-lived: aggregate points into regions, maybe add an elevation-band coordinate, grab several summary statistics, and continue with the detailed dataset. For that flow it makes sense to stay lightweight---attach labels or new coordinates (e.g., using `xsnow.extensions.classification.classify` or `classify_spatially`) and regroup with a couple lines of `groupby`/`resample`, without defining a new wrapper.\n", "\n", "If you find yourself repeating the same reshape and even caching the result to file, consider formalizing it as a `DatasetDecorator` (scenario 2): lock in the new dimensions or coordinates, record the metadata that explains the transformation, and make your modifications predictable, sharable, and compatible with other decorator chains." ] }, { "cell_type": "markdown", "id": "1eb09c65", "metadata": {}, "source": [ "```{admonition} Tip\n", ":class: tip\n", "\n", "After reshaping, call `xsnowDataset.reorder_dims(...)` to keep the dataset’s dimension order tidy and predictable when printing, exploring, or writing to file.\n", "```" ] }, { "cell_type": "markdown", "id": "112d6b8e", "metadata": {}, "source": [ "## Pattern: Aggregating statistics grouped by category\n", "An exact recipe for altering core dimensions is not really useful here---there is a lot of customization and context involved. A universal pattern, often called \"split--apply--combine\" or for this application more intuitive \"categorize--group--aggregate\", is a good mental model:\n", "\n", "- **categorize** your data (e.g., via xsnow’s classification extension),\n", "- **group** by one or more categories (e.g., `.groupby()`),\n", "- **aggregate/apply** per group with the reduction you need.\n", "\n", "One simple illustration with `xsnowDataset`:" ] }, { "cell_type": "code", "execution_count": null, "id": "238e4eda", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[i] xsnow.xsnow_io: Using lazy loading with Dask\n", "[i] xsnow.xsnow_io: Creating lazy datasets backed by dask arrays...\n", "[i] xsnow.xsnow_io: Data will NOT be computed until explicitly requested by user\n", "[i] xsnow.xsnow_io: Created 25 lazy datasets (data NOT yet loaded into memory)\n" ] }, { "data": { "text/plain": [ "Frozen({'elev_band': 3, 'slope': 5, 'time': 416, 'realization': 1, 'layer': 33})" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "import xsnow\n", "import xsnow.extensions.classification as xsclass\n", "\n", "# 1) Create elevation bands as categories\n", "xs = xsnow.sample_data.snp_gridded_ds().compute()\n", "band_edges = np.array([1500, 1800, 2100])\n", "xs_band = xs.classify(\n", " xsclass.classify_by_bins, # \"categorize\"\n", " output_name=\"elev_band\",\n", " output_kind=\"coord\",\n", " input_var=\"altitude\",\n", " attrs={\n", " \"mapping\": {\n", " 0: \"<1500 m\",\n", " 1: \"1500–1800 m\",\n", " 2: \"1800–2100 m\",\n", " 3: \">=2100 m\",\n", " },\n", " \"bin_edges\": band_edges.tolist(),\n", " },\n", " bins=band_edges,\n", " right=True,\n", ")\n", "\n", "# 2) Group and aggregate (here: median by band and slope)\n", "ds_stats = xs_band.groupby([\"elev_band\", \"slope\"]).median() # \"group --> aggregate\"\n", "xs_stats = xsnow.xsnowDataset(ds_stats)\n", "xs_stats.sizes" ] }, { "cell_type": "markdown", "id": "e9891b23", "metadata": {}, "source": [ "- The `location` dimension disappears because each point was categorized into a band and then aggregated; the grouped labels (`elev_band`, `slope`) now appear first in the dimension order.\n", "- If you want an ordering of the dimensions that is closer to the familiar base order:" ] }, { "cell_type": "code", "execution_count": 4, "id": "1c77f699", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Frozen({'elev_band': 3, 'time': 416, 'slope': 5, 'realization': 1, 'layer': 33})" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xs_stats = xs_stats.reorder_dims(\n", " (\"elev_band\", \"time\", \"slope\", \"realization\", \"layer\"),\n", ")\n", "xs_stats.sizes" ] }, { "cell_type": "markdown", "id": "659e0b12", "metadata": {}, "source": [ "## How this fits with the extension framework\n", "- Scenario 1 methods still work on the rewritten dataset (e.g., compute new stability indices per region).\n", "- Scenario 2 decorators can be chained before or after you modify the core dimensions.\n", "- Ensure you still have an `xsnowDataset` when applying other extensions; operations like xarray’s `.groupby()` often return an `xr.Dataset`, which you may need to wrap back first (see example above).\n", "\n", "The more you change the core dimensions, the more likely the resulting dataset becomes incompatible with other extension workflows---keep that in mind when designing reusable pipelines. And reach out if you think a specific constraint in a built-in extension should be improved." ] } ], "metadata": { "kernelspec": { "display_name": "xsnow-dev", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.5" } }, "nbformat": 4, "nbformat_minor": 5 }