xsnow_io module#

Handles all data parsing and I/O for the xsnow package, enabling scalable and memory-efficient processing of large datasets using Dask.

This module is the primary engine for reading and writing snowpack data in various formats (e.g., SNOWPACK .pro, .smet, NetCDF).

Core Dask Integration#

To handle datasets that are larger than memory, this module employs a lazy, out-of-core loading strategy for .pro files using Dask. The process is:

  1. Pre-Scan for Metadata: A quick, parallel pre-scan of all .pro files is performed to determine essential metadata, such as the maximum number of snow layers (max_layers), without loading the full data into memory.

  2. Lazy Graph Construction: A Dask computation graph is built where each node represents the task of reading and processing a single .pro file into an xarray.Dataset. These tasks are “lazy” and are not executed immediately.

  3. Parallel Computation: When the data is needed, Dask’s scheduler executes the graph in parallel, reading files and creating datasets in chunks. This ensures that only a fraction of the total data resides in memory at any given time, enabling the processing of vast amounts of data on a single machine.

xsnow.xsnow_io.read(source, recursive=False, time=None, location=None, slope=None, realization=None, lazy=None, lazy_threshold=30, n_cpus_use=None, logger=None, chunks=None, **parser_kwargs)#

Reads, parses, and combines snowpack data into a unified xsnowDataset.

This is the primary user-facing function for loading data. It orchestrates the discovery, parallel processing, and merging of snow profile (.pro) and time series (.smet) files into a single, coherent dataset.

Parameters:
  • source (Union[str, List[str], Path, List[Path]]) – The data source. Can be a path to a single file (e.g., ‘.pro’, ‘.smet’, ‘.nc’), a directory containing data files, or a list of file paths.

  • recursive (bool) – If True, search subdirectories recursively for .pro and .smet files when source is a directory. Default is False for backward compatibility and to avoid accidentally including unwanted files.

  • time (Optional[TimeSelector]) – A TimeSelector object for filtering data by time.

  • location (Optional[LocationSelector]) – A LocationSelector object for filtering by station ID.

  • slope (Optional[SlopeSelector]) – A SlopeSelector for filtering by slope angle.

  • realization (Optional[RealizationSelector]) – A RealizationSelector for filtering by model realization.

  • lazy (Optional[bool]) – If True, use Dask for lazy, memory-efficient loading (good for large datasets). If False, load all data eagerly into memory with parallel processing (good for small to medium datasets). If None (default), auto-detect based on file count (sweet spot to be determined).

  • lazy_threshold (int) – Number of files above which lazy loading is enabled when lazy=None (auto-detect). Default: 30 files.

  • n_cpus_use (Optional[int]) – The number of CPU cores to use. If not provided, defaults to min(32, total_cores - 1), with a minimum of 1.

  • logger (Optional[Logger]) – An optional, pre-configured logger instance.

  • chunks (Optional[Dict[str, int]]) – Chunk sizes for dask arrays when lazy=True. Dict with keys ‘time’ and ‘layer’. Default: {‘time’: 100, ‘layer’: -1} Example: {‘time’: 50, ‘layer’: 100}

  • **parser_kwargs (Any) –

    Additional keyword arguments passed to the underlying parser functions: - max_layers: Max layer dimension; ‘auto’ (default) scans files to pick

    the maximum, or set an int to force a fixed size.

    • remove_soil: If True (default), drops soil layers during parsing.

    • add_surface_sh_as_layer: If True (default), inserts a surface hoar layer explicitly as layer when indicated by the PRO file special code for SH at surface.

    • norm_slopes: Controls slope normalization in SNOWPACK parsing / merging. Options:

      • ’auto’ (default): normalize when slopes are consistent across locations; preserve per-location slopes when inconsistent.

      • True: force normalization (collapse to (slope,) even if slopes vary by location).

      • False: skip normalization entirely.

Return type:

Optional[xsnowDataset]

Returns:

A unified xsnowDataset object containing the data from all sources, or None if no valid data files are found.

Examples

>>> # Auto-detect (lazy for many files, eager for few)
>>> ds = read("data/")
>>> # Force lazy loading for large datasets
>>> ds = read("large_archive/", lazy=True, n_cpus_use=8)
>>> # Force eager loading for quick access
>>> ds = read("small_dataset/", lazy=False, n_cpus_use=4)
xsnow.xsnow_io.read_smet(filepath, datetime_start=None, datetime_end=None, logger=None)#

Parses a single SMET file into an xarray.Dataset.

Reads a SMET-formatted text file, parsing the header for metadata and the data section into a pandas DataFrame, which is then converted into an xarray.Dataset. The data is filtered by the specified time range.

Parameters:
  • filepath (Union[str, Path]) – Path to the SMET file.

  • datetime_start (Optional[str]) – The start of the time range to read. Data before this time is dropped.

  • datetime_end (Optional[str]) – The end of the time range to read. Data after this time is dropped.

  • logger (Optional[logging.Logger]) – An optional, pre-configured logger instance. If None, the default module logger is used.

Return type:

Optional[Dataset]

Returns:

An xarray.Dataset containing the SMET data, or None if the file cannot be parsed or contains no data in the specified time range.

xsnow.xsnow_io.append_latest(existing_ds, source, from_time=None, join='left', logger=None, **kwargs)#

Incrementally extend a dataset by appending newer timestamps from files.

This method is a convenience wrapper for time selection, read, and concat. Semantics:

  1. Determine a start time:
    • If from_time is None, use (max(existing time) + 1s).

    • If from_time is provided, drop all times <= from_time from existing_ds and start reading from from_time (inclusive).

  2. Read source with that time filter.

  3. Concatenate along time using xsnow.concat.
    • join controls non-time dims (default: ‘left’ keeps existing domains).

    • Time is always the union along the concat axis (no extra reindexing).

Parameters:
  • existing_ds (xsnowDataset) – The dataset instance to extend.

  • source (Union[str, List[str], Path]) – Path(s) to files or directories to read new data from (calls read upon it).

  • from_time (Union[str, datetime, Timestamp, datetime64, None]) – Subset existing data to times < from_time and read new data starting with from_time (this will already be new data). (also read Semantics above) Accepts str/pandas/np datetime or datetime.

  • join (str) – Join mode for non-time dimensions forwarded to xsnow.concat.

  • logger (Optional[Logger]) – Optional logger to use for status/warning messages. If None, a logger is created.

  • **kwargs – Additional keyword args forwarded to xsnow.concat like compat or combine_attrs.

Return type:

xsnowDataset

Returns:

xsnowDataset with the appended data (or trimmed-only if nothing new).

xsnow.xsnow_io.to_netcdf(ds, path, logger=None, **kwargs)#

Saves the dataset to a NetCDF file.

This method provides a convenient wrapper around the underlying xarray.Dataset.to_netcdf() method for easy caching and interoperability.

The location mapping dictionary is converted to NetCDF-compatible format by encoding it as JSON in a string attribute.

Parameters:
  • ds (xsnowDataset) – The dataset instance to save.

  • path (Union[str, Path]) – The destination file path for the .nc file.

  • logger (logging.Logger) – An optional preconfigured logger instance.

  • **kwargs – Additional keyword arguments passed to xarray.Dataset.to_netcdf().

xsnow.xsnow_io.to_smet(ds, path, max_files=None, **kwargs)#

Saves time-series data from the dataset to a SMET file.

This function extracts data variables that do not have a ‘layer’ dimension (e.g., meteorological data, total snow height) and writes them into the SMET format. It only supports writing data for a single location.

Parameters:
  • ds (xsnowDataset) – The dataset instance to save.

  • path (Union[str, Path]) – The destination file path for the .smet file.

  • max_files (int, optional) – The maximum number of locations allowed. If the dataset contains more locations, a ValueError is raised. Defaults to None (no limit).

  • **kwargs – Reserved for future filtering options.

xsnow.xsnow_io.to_pro(ds, path, max_files=None, **kwargs)#

Saves a single snow profile to a SNOWPACK .pro file.

This function iterates through each timestamp in the dataset and writes the vertical profile data (variables with a ‘layer’ dimension) into the .pro format. It only supports writing data for a single location.

Parameters:
  • ds (xsnowDataset) – The dataset instance to save.

  • path (Union[str, Path]) – The destination file path for the .pro file.

  • max_files (int, optional) – The maximum number of profiles (timestamps) allowed. If the dataset contains more profiles, a ValueError is raised. Defaults to None (no limit).

  • **kwargs – Reserved for future filtering options.

Raises:

ValueError – If the dataset is empty, contains more than one location, or if the number of profiles exceeds max_files.

xsnow.xsnow_io.to_json(ds, path, **kwargs)#

Saves the dataset to a structured JSON file. (Not Implemented)

xsnow.xsnow_io.to_caaml(ds, path, **kwargs)#

Saves snow profile data to a CAAML V6.0 XML file. (Not Implemented)

xsnow.xsnow_io.to_crocus(ds, path, **kwargs)#

Saves snow profile data to a Crocus model input file. (Not Implemented)