Grain types and other labeled data

Grain types and other labeled data#

For efficiency reasons, data variables across a large dataset are best handled as numeric codes. To ease interpreting numeric codes, xsnow implements strategies for translating these codes into human-readable labels.

Grain types are a stereotypical example for this concept. While the International Classification for Seasonal Snow on the Ground (ICSSG) lists grain type names (and acronyms) like Precipitation Particles (PP), Faceted Crystals (FC), and more, SNOWPACK uses a one-to-three-digit code to describe the (primary grain shape, secondary grain shape, grain shape cycle).

Accessors translate those codes to human-readable labels on demand without mutating the stored data:

  • .icssg.*—accessor for translating grain codes to ICSSG acronyms.

  • .labels.get()—generic accessor for custom codes whose mappings are stored in the metadata.

import xsnow
xs = xsnow.single_profile_timeseries()
[i] xsnow.xsnow_io: Loading 2 datasets eagerly with 1 workers...
[i] xsnow.utils: Slope coordinate 'inclination' varies by location. Preserving (location, slope) dimensions as allow_per_location=True.
[i] xsnow.utils: Slope coordinate 'azimuth' varies by location. Preserving (location, slope) dimensions as allow_per_location=True.

Grain codes vs labels#

The icssg accessor can be applied to all variables with grain type units known to xsnow. For example, in our sample dataset, this would be the data variable grain_type:

xs["grain_type"].attrs["units"]
'Swiss Code F1F2F3'

The icssg accessor has two methods, primary() and secondary(), that extract the two labels from one grain code:

# Subset to one profile
profile = xs.isel(time=-1, slope=0, realization=0)

grain_codes     = profile.grain_type
grain_types_pri = grain_codes.icssg.primary()
grain_types_sec = grain_codes.icssg.secondary()

print("Grain codes:", grain_codes.to_numpy())
print("Primary labels:", grain_types_pri.to_numpy())
print("Secondary labels:", grain_types_sec.to_numpy())
Grain codes: [[770. 770. 772. 772. 330. 330. 330. 330. 330. 330. 330. 330. 330. 330.
  330. 330. 330. 330. 772. 440. 440. 440. 440. 440. 772. 772. 450. 450.
  450. 450. 450. 440. 440. 440. 440. 440. 440. 440. 440. 660. 450. 431.
  440. 430. 330. 330. 330. 330. 341. 341. 990. 990. 330. 772. 772. 341.
  230. 230. 110.]]
Primary labels: [['MF' 'MF' 'MFcr' 'MFcr' 'RG' 'RG' 'RG' 'RG' 'RG' 'RG' 'RG' 'RG' 'RG'
  'RG' 'RG' 'RG' 'RG' 'RG' 'MFcr' 'FC' 'FC' 'FC' 'FC' 'FC' 'MFcr' 'MFcr'
  'FC' 'FC' 'FC' 'FC' 'FC' 'FC' 'FC' 'FC' 'FC' 'FC' 'FC' 'FC' 'FC' 'SH'
  'FC' 'FC' 'FC' 'FC' 'RG' 'RG' 'RG' 'RG' 'RG' 'RG' 'FCxr' 'FCxr' 'RG'
  'MFcr' 'MFcr' 'RG' 'DF' 'DF' 'PP']]
Secondary labels: [['MF' 'MF' 'MF' 'MF' 'RG' 'RG' 'RG' 'RG' 'RG' 'RG' 'RG' 'RG' 'RG' 'RG'
  'RG' 'RG' 'RG' 'RG' 'MF' 'FC' 'FC' 'FC' 'FC' 'FC' 'MF' 'MF' 'DH' 'DH'
  'DH' 'DH' 'DH' 'FC' 'FC' 'FC' 'FC' 'FC' 'FC' 'FC' 'FC' 'SH' 'DH' 'RG'
  'FC' 'RG' 'RG' 'RG' 'RG' 'RG' 'FC' 'FC' 'FCxr' 'FCxr' 'RG' 'MF' 'MF'
  'FC' 'RG' 'RG' 'PP']]

You can create new data variables that store the labels:

xs['gtype_prim'] = xs['grain_type'].icssg.primary()
print(xs['gtype_prim'])
<xarray.DataArray 'gtype_prim' (location: 1, time: 381, slope: 1,
                                realization: 1, layer: 59)> Size: 180kB
array([[[[['MF', 'MF', 'MFcr', ..., None, None, None]]],


        [[['MF', 'MF', 'MFcr', ..., None, None, None]]],


        [[['MF', 'MF', 'MFcr', ..., None, None, None]]],


        ...,


        [[['MF', 'MF', 'MFcr', ..., 'DF', 'DF', 'PP']]],


        [[['MF', 'MF', 'MFcr', ..., 'DF', 'DF', 'PP']]],


        [[['MF', 'MF', 'MFcr', ..., 'DF', 'DF', 'PP']]]]],
      shape=(1, 381, 1, 1, 59), dtype=object)
Coordinates:
  * location     (location) <U29 116B 'Kasererwinkl_C6'
  * time         (time) datetime64[ns] 3kB 2024-01-17T16:00:00 ... 2024-02-02...
  * slope        (slope) int64 8B 0
  * realization  (realization) int64 8B 0
  * layer        (layer) int64 472B 0 1 2 3 4 5 6 7 ... 51 52 53 54 55 56 57 58
    altitude     (location) float64 8B 1.681e+03
    azimuth      (location, slope) int64 8B 270
    inclination  (location, slope) int64 8B 16
    latitude     (location) float64 8B 47.1
    longitude    (location) float64 8B 11.62
    z            (location, time, slope, realization, layer) float32 90kB -75...
Attributes:
    standard_name:  snow_grain_type
    units:          Swiss Code F1F2F3
    long_name:      snow grain type

If you want to convert other grain codes than the Swiss Code F1F2F3, get in touch to add your routine to the graincode_converter module.

Generic labels for codes#

You can use the generic labels accessor for your own custom mappings between codes and labels.

Let’s illustrate this concept using the profile_status as an example. Inspect the mapping of codes-to-labels in the metadata:

xs['profile_status'].attrs
{'long_name': 'profile status (snow layer data)',
 'units': '1',
 'standard_name': 'profile_status',
 'mapping': {-1: 'error during parsing',
  0: 'no layer data',
  1: 'existing layer data'}}
xs['profile_status'].labels.get(as_dict=True)
{-1: 'error during parsing', 0: 'no layer data', 1: 'existing layer data'}
xs['profile_status'].labels.get()
<xarray.DataArray 'profile_status' (location: 1, time: 381, slope: 1,
                                    realization: 1)> Size: 3kB
array([[[['existing layer data']],

        [['existing layer data']],

        [['existing layer data']],

        [['existing layer data']],

        [['existing layer data']],

        [['existing layer data']],

        [['existing layer data']],

        [['existing layer data']],

        [['existing layer data']],

        [['existing layer data']],

...

        [['existing layer data']],

        [['existing layer data']],

        [['existing layer data']],

        [['existing layer data']],

        [['existing layer data']],

        [['existing layer data']],

        [['existing layer data']],

        [['existing layer data']],

        [['existing layer data']],

        [['existing layer data']]]], dtype=object)
Coordinates:
  * location     (location) <U29 116B 'Kasererwinkl_C6'
  * time         (time) datetime64[ns] 3kB 2024-01-17T16:00:00 ... 2024-02-02...
  * slope        (slope) int64 8B 0
  * realization  (realization) int64 8B 0
    altitude     (location) float64 8B 1.681e+03
    azimuth      (location, slope) int64 8B 270
    inclination  (location, slope) int64 8B 16
    latitude     (location) float64 8B 47.1
    longitude    (location) float64 8B 11.62
Attributes:
    long_name:      profile status (snow layer data)
    units:          1
    standard_name:  profile_status
    mapping:        {-1: 'error during parsing', 0: 'no layer data', 1: 'exis...

All profiles in our timeseries are getting labeled with ‘existing layer data’. This is not surprising given that the profile_status_summary shows only valid profiles:

xs.profile_status_summary()
profile_status
1    381
Name: profile_status, dtype: int64