classification module

classification module#

Summary#

Classify profiles, scalars, or layers in `xsnowDataset`s

This is a very generic extension that allows masking or classfiying any fields in an xsnowDataset. The module also acts as a template for demonstrating how to develop a very simple extension that adds a few additional xsnowDataset methods. This coincides with application scenario 1 from the xsnow extension framework.

In brief: To develop an additional xsnowDataset method, simply write a function that takes an xsnowDataset as first argument, and add the following line above the function signature: @register_xsnowDataset_method. Check out this module’s source code.

Note

The extension is currently missing a convenient method to classify based on spatial considerations, be it through a geojson file or well-known-text filter. This feature will be added soon.

xsnow.extensions.classification.classify(self, func, *, output_name='class_id', output_kind='coord', input_var=None, attrs=None, **func_kwargs)#

General classification: run a user function to compute a classification array, then attach it as a coordinate or data variable.

Parameters:
  • func (callable) – A custom classification function: Either f(xr.Dataset) -> xr.DataArray or f(xr.DataArray) -> xr.DataArray. The returned DataArray must align/broadcast within the dataset.

  • output_name (str, default "class_id") – Name of the resulting coord or data var.

  • output_kind ({"coord","data"}, default "coord") –

    How to store the classification result. * “coord”: attach as a coordinate (good for fast selection, e.g. ds.sel(hs_class=2)

    or boolean masks).

    • ”data”: attach as a data variable (good if you’ll plot it, compute stats on it, or

      don’t need to use it as a selector).

  • input_var (str, optional) – If set, call func(self.data[input_var]); else func(self.data).

  • attrs (dict, optional) – Metadata to attach to the resulting DataArray.

  • **func_kwargs – Extra keyword arguments forwarded to func.

Returns:

A new dataset wrapper with the classification attached.

Return type:

xsnowDataset

See also

classify_citeria

A generic, boolean classification function.

Examples

>>> def by_hs_tertiles(hs: xr.DataArray) -> xr.DataArray:
...     c = xr.zeros_like(hs, dtype=np.int64)
...     c = c.where(hs <= 0.5, 1)     # > 0.5 --> at least 1
...     c = c.where(hs <= 1.5, 2)     # > 1.5 --> 2
...     c.attrs["mapping"] = {0: "shallow", 1: "medium", 2: "deep"}
...     return c
>>> ds2 = ds.classify(by_hs, output_name="hs_class", output_kind="coord", input_var="HS")
>>> ds2.sel(hs_class=2)
>>> def classify_by_bins(da: xr.DataArray, *, bins: np.ndarray, right: bool = True) -> xr.DataArray:
        '''
        Bin `da` into integer classes using np.digitize.
        - `bins` are bin *edges* (length B) → classes are 0..B (B+1 classes).
        - If `right=True`, intervals are (-inf, b0], (b0, b1], ..., (b_{B-1}, +inf).
        Returns an int64 DataArray with same shape/coords as `da`.
        '''
        def _digitize(x, *, bins, right=False):
            return np.digitize(x, bins, right=right)
        classes = xr.apply_ufunc(
            _digitize,
            da,
            dask="allowed",
            kwargs={"bins": np.asarray(bins), "right": right},
            output_dtypes=[np.int64],
        )
        return classes
>>> hs_bins = np.array([0.25, 0.50, 1.00, 1.50, 2.00])
>>> hs_mapping = {
        0: "(-inf, 0.25]",
        1: "(0.25, 0.50]"
        2: "(0.50, 1.00]",
        3: "(1.00, 1.50]",
        4: "(1.50, 2.00]",
        5: "(2.00, +inf)"
    }
>>> class_attrs = {
        "mapping": hs_mapping,
        "long_name": "Snow depth class (6 bins)",
        "method": "np.digitize",
        "bin_edges": hs_bins.tolist(),
        "right_closed": True,
    }
>>> ds2 = ds.classify(
        classify_by_bins,
        output_name="HS_class",
        output_kind="coord",
        input_var="HS",          # classifier gets ds["HS"]
        attrs=class_attrs,
        bins=hs_bins,            # forwarded to classify_by_bins()
        right=True,
    )
>>> ds_extremely_deep = ds2.sel(HS_class=5)
xsnow.extensions.classification.classify_criteria(self, criteria, *, name='classification_mask', kind='mask', attrs=None)#

Criteria-based boolean classification: build a boolean mask from a string expression. The mask can be returned, or attached as a coordinate/data variable.

Parameters:
  • criteria (str) – Boolean conditions chained with ‘&’ and/or ‘|’, e.g. “density > 300 & grain_size < 0.5”. (Minimal parser: left-to-right; no parentheses/precedence.)

  • name (str, default "classification_mask") – Name for the attached output.

  • kind ({"mask","coord","data"}, default "mask") –

    • “mask”: return the mask (xr.DataArray)

    • ”coord”: attach as a coordinate and return xsnowDataset

    • ”data”: attach as a data var and return xsnowDataset

  • attrs (dict, optional) – Attributes to add to the mask array.

Return type:

xr.DataArray | xsnowDataset

See also

classify

Generic, customizable classification into multiple classes

Examples

>>> mask = ds.classify_criteria("density < 200 & grain_size > 0.8", kind="mask")
>>> ds_sel = ds.sel(layer=mask)
>>> ds2 = ds.classify_criteria("HS > 1.5", name="deep_snow", kind="coord")
>>> ds2.sel(deep_snow=True)
xsnow.extensions.classification.mask_by_criteria(self, criteria)#

Convenience wrapper around classify_criteria(…, kind=’mask’): masks non-matching layers (i.e., NaN) while keeping shape/dimensions identical. If the criteria are layer-based, scalars are untouched, and vice versa.

Return type:

xsnowDataset

Returns:

  • Masked xsnowDataset with identical shape/dimensions as the input, but NaNs occur where

  • the criteria are not True.