classification module#
Summary#
Classify profiles, scalars, or layers in `xsnowDataset`s
This is a very generic extension that allows masking or classfiying any fields in an xsnowDataset.
The module also acts as a template for demonstrating how to develop a very simple extension
that adds a few additional xsnowDataset methods. This coincides with application scenario 1 from
the xsnow extension framework.
In brief: To develop an additional xsnowDataset method, simply write a function that takes an
xsnowDataset as first argument, and add the following line above the function signature:
@register_xsnowDataset_method. Check out this module’s source code.
Note
The extension includes a basic spatial classification helper that can attach a polygon/region label
per location (or per point grid) based on dataset latitude/longitude coordinates.
- xsnow.extensions.classification.classify_by_bins(da, *, bins, right=True)#
Bin a DataArray into integer classes using
numpy.digitize.This helper is designed to work hand-in-hand with
xsnowDataset.classify()and to act as a template for custom user-defined classification functions.- Parameters:
da (xr.DataArray) – Input values to bin. Output preserves dims/coords and aligns with
da.bins (array-like) – Bin edges (length B). Output classes are integers in
[0, B](B+1 classes).right (bool, default True) – If True, intervals are
(-inf, b0],(b0, b1], …,(b_{B-1}, +inf). If False, intervals are[-inf, b0),[b0, b1), …,[b_{B-1}, +inf).
- Returns:
Integer class labels with the same shape as
da.- Return type:
xr.DataArray
See also
classifyAttach a classifier result to an xsnowDataset.
Examples
>>> hs_bins = np.array([0.25, 0.50, 1.00, 1.50, 2.00]) >>> class_attrs = { ... "long_name": "Snow depth class (6 bins)", ... "method": "np.digitize", ... "bin_edges": hs_bins.tolist(), ... "right_closed": True, ... } >>> ds2 = ds.classify( ... classify_by_bins, ... output_name="HS_class", ... output_kind="coord", ... input_var="HS", ... attrs=class_attrs, ... bins=hs_bins, ... right=True, ... ) >>> ds_deep = ds2.where(ds2.HS_class == 5, drop=True)
- xsnow.extensions.classification.classify(self, func, *, output_name='class_id', output_kind='coord', input_var=None, attrs=None, **func_kwargs)#
General classification: run a user function to compute a classification array, then attach it as a coordinate or data variable.
- Parameters:
func (callable) – A custom classification function: Either f(xr.Dataset) -> xr.DataArray or f(xr.DataArray) -> xr.DataArray. The returned DataArray must align/broadcast within the dataset.
output_name (str, default "class_id") – Name of the resulting coord or data var.
output_kind ({"coord","data"}, default "coord") –
How to store the classification result. * “coord”: attach as a coordinate (good for use with where(…), e.g.
ds.where(ds.hs_class == 2, drop=True), or for grouping).
- ”data”: attach as a data variable (good if you’ll plot it, compute stats on it, or
don’t need to use it as a selector).
input_var (str, optional) – If set, call func(self.data[input_var]); else func(self.data).
attrs (dict, optional) – Metadata to attach to the resulting DataArray.
**func_kwargs – Extra keyword arguments forwarded to func.
- Returns:
A new dataset wrapper with the classification attached.
- Return type:
See also
classify_citeriaA generic, boolean classification function.
classify_by_binsBin a variable into integer classes (template classifier).
Examples
>>> def by_hs_tertiles(hs: xr.DataArray) -> xr.DataArray: ... c = xr.zeros_like(hs, dtype=np.int64) ... c = c.where(hs <= 0.5, 1) # > 0.5 --> at least 1 ... c = c.where(hs <= 1.5, 2) # > 1.5 --> 2 ... c.attrs["mapping"] = {0: "shallow", 1: "medium", 2: "deep"} ... return c >>> ds2 = ds.classify(by_hs, output_name="hs_class", output_kind="coord", input_var="HS") >>> ds2.where(ds2.hs_class == 2, drop=True)
>>> from xsnow.extensions.classification import classify_by_bins >>> hs_bins = np.array([0.25, 0.50, 1.00, 1.50, 2.00]) >>> hs_mapping = { 0: "(-inf, 0.25]", 1: "(0.25, 0.50]", 2: "(0.50, 1.00]", 3: "(1.00, 1.50]", 4: "(1.50, 2.00]", 5: "(2.00, +inf)" } >>> class_attrs = { "mapping": hs_mapping, "long_name": "Snow depth class (6 bins)", "method": "np.digitize", "bin_edges": hs_bins.tolist(), "right_closed": True, } >>> ds2 = ds.classify( classify_by_bins, output_name="HS_class", output_kind="coord", input_var="HS", # classifier gets ds["HS"] attrs=class_attrs, bins=hs_bins, # forwarded to classify_by_bins() right=True, ) >>> ds_extremely_deep = ds2.where(ds2.HS_class == 5, drop=True)
- xsnow.extensions.classification.classify_criteria(self, criteria, *, name='classification_mask', kind='mask', attrs=None)#
Criteria-based boolean classification: build a boolean mask from a string expression. The mask can be returned, or attached as a coordinate/data variable.
- Parameters:
criteria (str) – Boolean conditions chained with ‘&’ and/or ‘|’, e.g. “density > 300 & grain_size < 0.5”. (Minimal parser: left-to-right; no parentheses/precedence.)
name (str, default "classification_mask") – Name for the attached output.
kind ({"mask","coord","data"}, default "mask") –
“mask”: return the mask (xr.DataArray)
”coord”: attach as a coordinate and return xsnowDataset
”data”: attach as a data var and return xsnowDataset
attrs (dict, optional) – Attributes to add to the mask array.
- Return type:
xr.DataArray | xsnowDataset
See also
classifyGeneric, customizable classification into multiple classes
Examples
>>> mask = ds.classify_criteria("density < 200 & grain_size > 0.8", kind="mask") >>> ds_sel = ds.where(mask, drop=True)
>>> ds2 = ds.classify_criteria("HS > 1.5", name="deep_snow", kind="coord") >>> ds2.where(ds2.deep_snow, drop=True)
- xsnow.extensions.classification.mask_by_criteria(self, criteria)#
Convenience wrapper around classify_criteria(…, kind=’mask’): masks non-matching layers (i.e., NaN) while keeping shape/dimensions identical. If the criteria are layer-based, scalars are untouched, and vice versa.
- Return type:
- Returns:
Masked xsnowDataset with identical shape/dimensions as the input, but NaNs occur where
the criteria are not True.
- xsnow.extensions.classification.classify_spatially(self, polygons, *, name='polygon', kind='coord', attrs=None, outside_label='outside', polygons_crs=None, feature_name_field='id')#
Spatial classification of points into named polygons/regions.
This attaches a region label (or returns a per-polygon mask) based on the dataset’s horizontal coordinates. Typical use-cases:
select all data from a specific region
aggregate over regions via
groupby
The dataset CRS is taken from
self.attrs['crs']. If it is EPSG:4326, this method uses thelongitude/latitudecoordinates. Otherwise it expects projected coordinateseasting/northing.If the polygon CRS differs from the dataset CRS, polygons are transformed to the dataset CRS using
pyproj(optional dependency).- Parameters:
polygons (
Any) –- One of:
path to a GeoJSON file (.geojson/.json)
GeoJSON mapping (FeatureCollection/Feature/Polygon/MultiPolygon)
WKT string (POLYGON/MULTIPOLYGON)
mapping of
{name: wkt}sequence of WKT strings (names auto: poly_0, poly_1, …)
sequence of
(name, wkt)pairs
Coordinate order is assumed to be
(x, y). For EPSG:4326 this means(longitude, latitude); for projected CRSs,(easting, northing).name (str, default "polygon") – Name for the attached label coordinate/data var (or the returned mask name).
kind ({"mask","coord","data"}, default "coord") –
“mask”: return a boolean mask with dims
('polygon', <point-dims...>)”coord”: attach a string label per point as a coordinate and return xsnowDataset
”data”: attach a string label per point as a data variable and return xsnowDataset
attrs (dict, optional) – Metadata to attach to the output.
outside_label (str, default "outside") – Label assigned to points not contained in any provided polygon.
polygons_crs (str, optional) – CRS of the input polygons (e.g.
"EPSG:4326"). If not provided, the CRS is only auto-detected for GeoJSON that contains an explicit (deprecated) top-level"crs"member. If the polygon CRS cannot be identified unambiguously, this method raises.feature_name_field (str, default "id") – For GeoJSON inputs, this field is read from each feature’s
propertiesto determine the polygon/region name. If absent, it falls back to common alternatives (name,id,title,label) and finallypoly_<i>.
- Return type:
xr.DataArray | xsnowDataset
Examples
>>> import xarray as xr >>> import xsnow >>> xs = xsnow.sample_data.snp_gridded_ds().compute() >>> xs.longitude.values array([11.191828, 11.286182, 11.2973 , 11.392236, 11.5008 ]) >>> xs.latitude.values array([47.146392, 47.436152, 47.148326, 47.437999, 47.367788]) >>> xs.altitude.values array([2372., 1749., 1860., 1801., 2066.])
Create two regions (rectangles) and leave one location outside:
>>> regions = { ... "valley": "POLYGON ((11.15 47.12, 11.15 47.17, 11.33 47.17, 11.33 47.12, 11.15 47.12))", ... "ridge": "POLYGON ((11.25 47.42, 11.25 47.46, 11.43 47.46, 11.43 47.42, 11.25 47.42))", ... } >>> xs2 = xs.classify_spatially(regions, name="region", kind="coord", polygons_crs="EPSG:4326") >>> xs_valley = xs2.where(xs2.region == "valley", drop=True)
Group by region and slope (slope is already an existing coordinate):
>>> xs_stats = xs2.groupby(["region", "slope"]).mean()
Group by region, slope, and an elevation band (>=2000 m) created via
classify_criteria:>>> xs3 = xs2.classify_criteria( ... "altitude >= 2000", ... name="band", ... kind="coord", ... attrs={"mapping": {False: "<2000m", True: ">=2000m"}}, ... ) >>> xs_stats2 = xs3.groupby(["region", "slope", "band"]).mean()