classification module#
Summary#
Classify profiles, scalars, or layers in `xsnowDataset`s
This is a very generic extension that allows masking or classfiying any fields in an xsnowDataset.
The module also acts as a template for demonstrating how to develop a very simple extension
that adds a few additional xsnowDataset methods. This coincides with application scenario 1 from
the xsnow extension framework.
In brief: To develop an additional xsnowDataset method, simply write a function that takes an
xsnowDataset as first argument, and add the following line above the function signature:
@register_xsnowDataset_method. Check out this module’s source code.
Note
The extension is currently missing a convenient method to classify based on spatial considerations, be it through a geojson file or well-known-text filter. This feature will be added soon.
- xsnow.extensions.classification.classify(self, func, *, output_name='class_id', output_kind='coord', input_var=None, attrs=None, **func_kwargs)#
General classification: run a user function to compute a classification array, then attach it as a coordinate or data variable.
- Parameters:
func (callable) – A custom classification function: Either f(xr.Dataset) -> xr.DataArray or f(xr.DataArray) -> xr.DataArray. The returned DataArray must align/broadcast within the dataset.
output_name (str, default "class_id") – Name of the resulting coord or data var.
output_kind ({"coord","data"}, default "coord") –
How to store the classification result. * “coord”: attach as a coordinate (good for fast selection, e.g. ds.sel(hs_class=2)
or boolean masks).
- ”data”: attach as a data variable (good if you’ll plot it, compute stats on it, or
don’t need to use it as a selector).
input_var (str, optional) – If set, call func(self.data[input_var]); else func(self.data).
attrs (dict, optional) – Metadata to attach to the resulting DataArray.
**func_kwargs – Extra keyword arguments forwarded to func.
- Returns:
A new dataset wrapper with the classification attached.
- Return type:
See also
classify_citeriaA generic, boolean classification function.
Examples
>>> def by_hs_tertiles(hs: xr.DataArray) -> xr.DataArray: ... c = xr.zeros_like(hs, dtype=np.int64) ... c = c.where(hs <= 0.5, 1) # > 0.5 --> at least 1 ... c = c.where(hs <= 1.5, 2) # > 1.5 --> 2 ... c.attrs["mapping"] = {0: "shallow", 1: "medium", 2: "deep"} ... return c >>> ds2 = ds.classify(by_hs, output_name="hs_class", output_kind="coord", input_var="HS") >>> ds2.sel(hs_class=2)
>>> def classify_by_bins(da: xr.DataArray, *, bins: np.ndarray, right: bool = True) -> xr.DataArray: ''' Bin `da` into integer classes using np.digitize. - `bins` are bin *edges* (length B) → classes are 0..B (B+1 classes). - If `right=True`, intervals are (-inf, b0], (b0, b1], ..., (b_{B-1}, +inf). Returns an int64 DataArray with same shape/coords as `da`. ''' def _digitize(x, *, bins, right=False): return np.digitize(x, bins, right=right) classes = xr.apply_ufunc( _digitize, da, dask="allowed", kwargs={"bins": np.asarray(bins), "right": right}, output_dtypes=[np.int64], ) return classes >>> hs_bins = np.array([0.25, 0.50, 1.00, 1.50, 2.00]) >>> hs_mapping = { 0: "(-inf, 0.25]", 1: "(0.25, 0.50]" 2: "(0.50, 1.00]", 3: "(1.00, 1.50]", 4: "(1.50, 2.00]", 5: "(2.00, +inf)" } >>> class_attrs = { "mapping": hs_mapping, "long_name": "Snow depth class (6 bins)", "method": "np.digitize", "bin_edges": hs_bins.tolist(), "right_closed": True, } >>> ds2 = ds.classify( classify_by_bins, output_name="HS_class", output_kind="coord", input_var="HS", # classifier gets ds["HS"] attrs=class_attrs, bins=hs_bins, # forwarded to classify_by_bins() right=True, ) >>> ds_extremely_deep = ds2.sel(HS_class=5)
- xsnow.extensions.classification.classify_criteria(self, criteria, *, name='classification_mask', kind='mask', attrs=None)#
Criteria-based boolean classification: build a boolean mask from a string expression. The mask can be returned, or attached as a coordinate/data variable.
- Parameters:
criteria (str) – Boolean conditions chained with ‘&’ and/or ‘|’, e.g. “density > 300 & grain_size < 0.5”. (Minimal parser: left-to-right; no parentheses/precedence.)
name (str, default "classification_mask") – Name for the attached output.
kind ({"mask","coord","data"}, default "mask") –
“mask”: return the mask (xr.DataArray)
”coord”: attach as a coordinate and return xsnowDataset
”data”: attach as a data var and return xsnowDataset
attrs (dict, optional) – Attributes to add to the mask array.
- Return type:
xr.DataArray | xsnowDataset
See also
classifyGeneric, customizable classification into multiple classes
Examples
>>> mask = ds.classify_criteria("density < 200 & grain_size > 0.8", kind="mask") >>> ds_sel = ds.sel(layer=mask)
>>> ds2 = ds.classify_criteria("HS > 1.5", name="deep_snow", kind="coord") >>> ds2.sel(deep_snow=True)
- xsnow.extensions.classification.mask_by_criteria(self, criteria)#
Convenience wrapper around classify_criteria(…, kind=’mask’): masks non-matching layers (i.e., NaN) while keeping shape/dimensions identical. If the criteria are layer-based, scalars are untouched, and vice versa.
- Return type:
- Returns:
Masked xsnowDataset with identical shape/dimensions as the input, but NaNs occur where
the criteria are not True.