classification module#

Summary#

Classify profiles, scalars, or layers in `xsnowDataset`s

This is a very generic extension that allows masking or classfiying any fields in an xsnowDataset. The module also acts as a template for demonstrating how to develop a very simple extension that adds a few additional xsnowDataset methods. This coincides with application scenario 1 from the xsnow extension framework.

In brief: To develop an additional xsnowDataset method, simply write a function that takes an xsnowDataset as first argument, and add the following line above the function signature: @register_xsnowDataset_method. Check out this module’s source code.

Note

The extension includes a basic spatial classification helper that can attach a polygon/region label per location (or per point grid) based on dataset latitude/longitude coordinates.

xsnow.extensions.classification.classify_by_bins(da, *, bins, right=True)#

Bin a DataArray into integer classes using numpy.digitize.

This helper is designed to work hand-in-hand with xsnowDataset.classify() and to act as a template for custom user-defined classification functions.

Parameters:
  • da (xr.DataArray) – Input values to bin. Output preserves dims/coords and aligns with da.

  • bins (array-like) – Bin edges (length B). Output classes are integers in [0, B] (B+1 classes).

  • right (bool, default True) – If True, intervals are (-inf, b0], (b0, b1], …, (b_{B-1}, +inf). If False, intervals are [-inf, b0), [b0, b1), …, [b_{B-1}, +inf).

Returns:

Integer class labels with the same shape as da.

Return type:

xr.DataArray

See also

classify

Attach a classifier result to an xsnowDataset.

Examples

>>> hs_bins = np.array([0.25, 0.50, 1.00, 1.50, 2.00])
>>> class_attrs = {
...   "long_name": "Snow depth class (6 bins)",
...   "method": "np.digitize",
...   "bin_edges": hs_bins.tolist(),
...   "right_closed": True,
... }
>>> ds2 = ds.classify(
...   classify_by_bins,
...   output_name="HS_class",
...   output_kind="coord",
...   input_var="HS",
...   attrs=class_attrs,
...   bins=hs_bins,
...   right=True,
... )
>>> ds_deep = ds2.where(ds2.HS_class == 5, drop=True)
xsnow.extensions.classification.classify(self, func, *, output_name='class_id', output_kind='coord', input_var=None, attrs=None, **func_kwargs)#

General classification: run a user function to compute a classification array, then attach it as a coordinate or data variable.

Parameters:
  • func (callable) – A custom classification function: Either f(xr.Dataset) -> xr.DataArray or f(xr.DataArray) -> xr.DataArray. The returned DataArray must align/broadcast within the dataset.

  • output_name (str, default "class_id") – Name of the resulting coord or data var.

  • output_kind ({"coord","data"}, default "coord") –

    How to store the classification result. * “coord”: attach as a coordinate (good for use with where(…), e.g.

    ds.where(ds.hs_class == 2, drop=True), or for grouping).

    • ”data”: attach as a data variable (good if you’ll plot it, compute stats on it, or

      don’t need to use it as a selector).

  • input_var (str, optional) – If set, call func(self.data[input_var]); else func(self.data).

  • attrs (dict, optional) – Metadata to attach to the resulting DataArray.

  • **func_kwargs – Extra keyword arguments forwarded to func.

Returns:

A new dataset wrapper with the classification attached.

Return type:

xsnowDataset

See also

classify_citeria

A generic, boolean classification function.

classify_by_bins

Bin a variable into integer classes (template classifier).

Examples

>>> def by_hs_tertiles(hs: xr.DataArray) -> xr.DataArray:
...     c = xr.zeros_like(hs, dtype=np.int64)
...     c = c.where(hs <= 0.5, 1)     # > 0.5 --> at least 1
...     c = c.where(hs <= 1.5, 2)     # > 1.5 --> 2
...     c.attrs["mapping"] = {0: "shallow", 1: "medium", 2: "deep"}
...     return c
>>> ds2 = ds.classify(by_hs, output_name="hs_class", output_kind="coord", input_var="HS")
>>> ds2.where(ds2.hs_class == 2, drop=True)
>>> from xsnow.extensions.classification import classify_by_bins
>>> hs_bins = np.array([0.25, 0.50, 1.00, 1.50, 2.00])
>>> hs_mapping = {
        0: "(-inf, 0.25]",
        1: "(0.25, 0.50]",
        2: "(0.50, 1.00]",
        3: "(1.00, 1.50]",
        4: "(1.50, 2.00]",
        5: "(2.00, +inf)"
    }
>>> class_attrs = {
        "mapping": hs_mapping,
        "long_name": "Snow depth class (6 bins)",
        "method": "np.digitize",
        "bin_edges": hs_bins.tolist(),
        "right_closed": True,
    }
>>> ds2 = ds.classify(
        classify_by_bins,
        output_name="HS_class",
        output_kind="coord",
        input_var="HS",          # classifier gets ds["HS"]
        attrs=class_attrs,
        bins=hs_bins,            # forwarded to classify_by_bins()
        right=True,
    )
>>> ds_extremely_deep = ds2.where(ds2.HS_class == 5, drop=True)
xsnow.extensions.classification.classify_criteria(self, criteria, *, name='classification_mask', kind='mask', attrs=None)#

Criteria-based boolean classification: build a boolean mask from a string expression. The mask can be returned, or attached as a coordinate/data variable.

Parameters:
  • criteria (str) – Boolean conditions chained with ‘&’ and/or ‘|’, e.g. “density > 300 & grain_size < 0.5”. (Minimal parser: left-to-right; no parentheses/precedence.)

  • name (str, default "classification_mask") – Name for the attached output.

  • kind ({"mask","coord","data"}, default "mask") –

    • “mask”: return the mask (xr.DataArray)

    • ”coord”: attach as a coordinate and return xsnowDataset

    • ”data”: attach as a data var and return xsnowDataset

  • attrs (dict, optional) – Attributes to add to the mask array.

Return type:

xr.DataArray | xsnowDataset

See also

classify

Generic, customizable classification into multiple classes

Examples

>>> mask = ds.classify_criteria("density < 200 & grain_size > 0.8", kind="mask")
>>> ds_sel = ds.where(mask, drop=True)
>>> ds2 = ds.classify_criteria("HS > 1.5", name="deep_snow", kind="coord")
>>> ds2.where(ds2.deep_snow, drop=True)
xsnow.extensions.classification.mask_by_criteria(self, criteria)#

Convenience wrapper around classify_criteria(…, kind=’mask’): masks non-matching layers (i.e., NaN) while keeping shape/dimensions identical. If the criteria are layer-based, scalars are untouched, and vice versa.

Return type:

xsnowDataset

Returns:

  • Masked xsnowDataset with identical shape/dimensions as the input, but NaNs occur where

  • the criteria are not True.

xsnow.extensions.classification.classify_spatially(self, polygons, *, name='polygon', kind='coord', attrs=None, outside_label='outside', polygons_crs=None, feature_name_field='id')#

Spatial classification of points into named polygons/regions.

This attaches a region label (or returns a per-polygon mask) based on the dataset’s horizontal coordinates. Typical use-cases:

  • select all data from a specific region

  • aggregate over regions via groupby

The dataset CRS is taken from self.attrs['crs']. If it is EPSG:4326, this method uses the longitude/latitude coordinates. Otherwise it expects projected coordinates easting/northing.

If the polygon CRS differs from the dataset CRS, polygons are transformed to the dataset CRS using pyproj (optional dependency).

Parameters:
  • polygons (Any) –

    One of:
    • path to a GeoJSON file (.geojson/.json)

    • GeoJSON mapping (FeatureCollection/Feature/Polygon/MultiPolygon)

    • WKT string (POLYGON/MULTIPOLYGON)

    • mapping of {name: wkt}

    • sequence of WKT strings (names auto: poly_0, poly_1, …)

    • sequence of (name, wkt) pairs

    Coordinate order is assumed to be (x, y). For EPSG:4326 this means (longitude, latitude); for projected CRSs, (easting, northing).

  • name (str, default "polygon") – Name for the attached label coordinate/data var (or the returned mask name).

  • kind ({"mask","coord","data"}, default "coord") –

    • “mask”: return a boolean mask with dims ('polygon', <point-dims...>)

    • ”coord”: attach a string label per point as a coordinate and return xsnowDataset

    • ”data”: attach a string label per point as a data variable and return xsnowDataset

  • attrs (dict, optional) – Metadata to attach to the output.

  • outside_label (str, default "outside") – Label assigned to points not contained in any provided polygon.

  • polygons_crs (str, optional) – CRS of the input polygons (e.g. "EPSG:4326"). If not provided, the CRS is only auto-detected for GeoJSON that contains an explicit (deprecated) top-level "crs" member. If the polygon CRS cannot be identified unambiguously, this method raises.

  • feature_name_field (str, default "id") – For GeoJSON inputs, this field is read from each feature’s properties to determine the polygon/region name. If absent, it falls back to common alternatives (name, id, title, label) and finally poly_<i>.

Return type:

xr.DataArray | xsnowDataset

Examples

>>> import xarray as xr
>>> import xsnow
>>> xs = xsnow.sample_data.snp_gridded_ds().compute()
>>> xs.longitude.values
array([11.191828, 11.286182, 11.2973  , 11.392236, 11.5008  ])
>>> xs.latitude.values
array([47.146392, 47.436152, 47.148326, 47.437999, 47.367788])
>>> xs.altitude.values
array([2372., 1749., 1860., 1801., 2066.])

Create two regions (rectangles) and leave one location outside:

>>> regions = {
...   "valley": "POLYGON ((11.15 47.12, 11.15 47.17, 11.33 47.17, 11.33 47.12, 11.15 47.12))",
...   "ridge":  "POLYGON ((11.25 47.42, 11.25 47.46, 11.43 47.46, 11.43 47.42, 11.25 47.42))",
... }
>>> xs2 = xs.classify_spatially(regions, name="region", kind="coord", polygons_crs="EPSG:4326")
>>> xs_valley = xs2.where(xs2.region == "valley", drop=True)

Group by region and slope (slope is already an existing coordinate):

>>> xs_stats = xs2.groupby(["region", "slope"]).mean()

Group by region, slope, and an elevation band (>=2000 m) created via classify_criteria:

>>> xs3 = xs2.classify_criteria(
...   "altitude >= 2000",
...   name="band",
...   kind="coord",
...   attrs={"mapping": {False: "<2000m", True: ">=2000m"}},
... )
>>> xs_stats2 = xs3.groupby(["region", "slope", "band"]).mean()