xsnow_io module#
Handles all data parsing and I/O for the xsnow package, enabling scalable and memory-efficient processing of large datasets using Dask.
This module is the primary engine for reading and writing snowpack data in various formats (e.g., SNOWPACK .pro, .smet, NetCDF).
Core Dask Integration#
To handle datasets that are larger than memory, this module employs a lazy, out-of-core loading strategy for .pro files using Dask. The process is:
Pre-Scan for Metadata: A quick, parallel pre-scan of all .pro files is performed to determine essential metadata, such as the maximum number of snow layers (max_layers), without loading the full data into memory.
Lazy Graph Construction: A Dask computation graph is built where each node represents the task of reading and processing a single .pro file into an xarray.Dataset. These tasks are “lazy” and are not executed immediately.
Parallel Computation: When the data is needed, Dask’s scheduler executes the graph in parallel, reading files and creating datasets in chunks. This ensures that only a fraction of the total data resides in memory at any given time, enabling the processing of vast amounts of data on a single machine.
- xsnow.xsnow_io.read(source, recursive=False, time=None, location=None, slope=None, realization=None, lazy=None, lazy_threshold=30, n_cpus_use=None, logger=None, chunks=None, **parser_kwargs)#
Reads, parses, and combines snowpack data into a unified xsnowDataset.
This is the primary user-facing function for loading data. It orchestrates the discovery, parallel processing, and merging of snow profile (.pro) and time series (.smet) files into a single, coherent dataset.
- Parameters:
source (
Union[str,List[str],Path,List[Path]]) – The data source. Can be a path to a single file (e.g., ‘.pro’, ‘.smet’, ‘.nc’), a directory containing data files, or a list of file paths.recursive (
bool) – If True, search subdirectories recursively for .pro and .smet files when source is a directory. Default is False for backward compatibility and to avoid accidentally including unwanted files.time (
Optional[TimeSelector]) – A TimeSelector object for filtering data by time.location (
Optional[LocationSelector]) – A LocationSelector object for filtering by station ID.slope (
Optional[SlopeSelector]) – A SlopeSelector for filtering by slope angle.realization (
Optional[RealizationSelector]) – A RealizationSelector for filtering by model realization.lazy (
Optional[bool]) – If True, use Dask for lazy, memory-efficient loading (good for large datasets). If False, load all data eagerly into memory with parallel processing (good for small to medium datasets). If None (default), auto-detect based on file count (sweet spot to be determined).lazy_threshold (
int) – Number of files above which lazy loading is enabled when lazy=None (auto-detect). Default: 30 files.n_cpus_use (
Optional[int]) – The number of CPU cores to use. If not provided, defaults to min(32, total_cores - 1), with a minimum of 1.logger (
Optional[Logger]) – An optional, pre-configured logger instance.chunks (
Optional[Dict[str,int]]) – Chunk sizes for dask arrays when lazy=True. Dict with keys ‘time’ and ‘layer’. Default: {‘time’: 100, ‘layer’: -1} Example: {‘time’: 50, ‘layer’: 100}**parser_kwargs (
Any) –Additional keyword arguments passed to the underlying parser functions: - max_layers: Max layer dimension; ‘auto’ (default) scans files to pick
the maximum, or set an int to force a fixed size.
remove_soil: If True (default), drops soil layers during parsing.
add_surface_sh_as_layer: If True (default), inserts a surface hoar layer explicitly as layer when indicated by the PRO file special code for SH at surface.
norm_slopes: Controls slope normalization in SNOWPACK parsing / merging. Options:
’auto’ (default): normalize when slopes are consistent across locations; preserve per-location slopes when inconsistent.
True: force normalization (collapse to (slope,) even if slopes vary by location).
False: skip normalization entirely.
- Return type:
Optional[xsnowDataset]- Returns:
A unified xsnowDataset object containing the data from all sources, or None if no valid data files are found.
Examples
>>> # Auto-detect (lazy for many files, eager for few) >>> ds = read("data/")
>>> # Force lazy loading for large datasets >>> ds = read("large_archive/", lazy=True, n_cpus_use=8)
>>> # Force eager loading for quick access >>> ds = read("small_dataset/", lazy=False, n_cpus_use=4)
- xsnow.xsnow_io.read_smet(filepath, datetime_start=None, datetime_end=None, logger=None)#
Parses a single SMET file into an xarray.Dataset.
Reads a SMET-formatted text file, parsing the header for metadata and the data section into a pandas DataFrame, which is then converted into an xarray.Dataset. The data is filtered by the specified time range.
- Parameters:
filepath (
Union[str,Path]) – Path to the SMET file.datetime_start (
Optional[str]) – The start of the time range to read. Data before this time is dropped.datetime_end (
Optional[str]) – The end of the time range to read. Data after this time is dropped.logger (Optional[logging.Logger]) – An optional, pre-configured logger instance. If None, the default module logger is used.
- Return type:
Optional[Dataset]- Returns:
An xarray.Dataset containing the SMET data, or None if the file cannot be parsed or contains no data in the specified time range.
- xsnow.xsnow_io.append_latest(existing_ds, source, from_time=None, join='left', logger=None, **kwargs)#
Incrementally extend a dataset by appending newer timestamps from files.
This method is a convenience wrapper for time selection, read, and concat. Semantics:
- Determine a start time:
If from_time is None, use (max(existing time) + 1s).
If from_time is provided, drop all times <= from_time from existing_ds and start reading from from_time (inclusive).
Read source with that time filter.
- Concatenate along time using xsnow.concat.
join controls non-time dims (default: ‘left’ keeps existing domains).
Time is always the union along the concat axis (no extra reindexing).
- Parameters:
existing_ds (
xsnowDataset) – The dataset instance to extend.source (
Union[str,List[str],Path]) – Path(s) to files or directories to read new data from (calls read upon it).from_time (
Union[str,datetime,Timestamp,datetime64,None]) – Subset existing data to times < from_time and read new data starting with from_time (this will already be new data). (also read Semantics above) Accepts str/pandas/np datetime or datetime.join (
str) – Join mode for non-time dimensions forwarded to xsnow.concat.logger (
Optional[Logger]) – Optional logger to use for status/warning messages. If None, a logger is created.**kwargs – Additional keyword args forwarded to xsnow.concat like compat or combine_attrs.
- Return type:
- Returns:
xsnowDataset with the appended data (or trimmed-only if nothing new).
- xsnow.xsnow_io.to_netcdf(ds, path, logger=None, **kwargs)#
Saves the dataset to a NetCDF file.
This method provides a convenient wrapper around the underlying xarray.Dataset.to_netcdf() method for easy caching and interoperability.
The location mapping dictionary is converted to NetCDF-compatible format by encoding it as JSON in a string attribute.
- Parameters:
ds (xsnowDataset) – The dataset instance to save.
path (Union[str, Path]) – The destination file path for the .nc file.
logger (logging.Logger) – An optional preconfigured logger instance.
**kwargs – Additional keyword arguments passed to xarray.Dataset.to_netcdf().
- xsnow.xsnow_io.to_smet(ds, path, max_files=None, **kwargs)#
Saves time-series data from the dataset to a SMET file.
This function extracts data variables that do not have a ‘layer’ dimension (e.g., meteorological data, total snow height) and writes them into the SMET format. It only supports writing data for a single location.
- Parameters:
ds (xsnowDataset) – The dataset instance to save.
path (Union[str, Path]) – The destination file path for the .smet file.
max_files (int, optional) – The maximum number of locations allowed. If the dataset contains more locations, a ValueError is raised. Defaults to None (no limit).
**kwargs – Reserved for future filtering options.
- xsnow.xsnow_io.to_pro(ds, path, max_files=None, **kwargs)#
Saves a single snow profile to a SNOWPACK .pro file.
This function iterates through each timestamp in the dataset and writes the vertical profile data (variables with a ‘layer’ dimension) into the .pro format. It only supports writing data for a single location.
- Parameters:
ds (xsnowDataset) – The dataset instance to save.
path (Union[str, Path]) – The destination file path for the .pro file.
max_files (int, optional) – The maximum number of profiles (timestamps) allowed. If the dataset contains more profiles, a ValueError is raised. Defaults to None (no limit).
**kwargs – Reserved for future filtering options.
- Raises:
ValueError – If the dataset is empty, contains more than one location, or if the number of profiles exceeds max_files.
- xsnow.xsnow_io.to_json(ds, path, **kwargs)#
Saves the dataset to a structured JSON file. (Not Implemented)
- xsnow.xsnow_io.to_caaml(ds, path, **kwargs)#
Saves snow profile data to a CAAML V6.0 XML file. (Not Implemented)
- xsnow.xsnow_io.to_crocus(ds, path, **kwargs)#
Saves snow profile data to a Crocus model input file. (Not Implemented)