{ "cells": [ { "cell_type": "markdown", "id": "9110dbc0", "metadata": {}, "source": [ "# Adding new functionality\n", "or more verbosely:\n", "# Scenario 2: Adding new dimensions from alternate data streams and providing entirely new functionality\n", "\n", "If you want to add new dimensions to an `xsnowDataset` or more generally extend the xsnow functionality beyond a single method, write your own extension class. `xsnow` has everything prepared to make this very straightforward for you. As for a *scenario-1-extension*, write your class in a python module and decide whether you want to keep this module private to yourself or host it in a public repository. \n", "\n", "Here is a cheat sheet for the steps you have to take. You will find more detailed explanations and a demonstration further below:\n" ] }, { "cell_type": "markdown", "id": "6ffd7671", "metadata": {}, "source": [ "```{admonition} Recipe\n", ":class: note\n", "\n", " 1. Import: `from xsnow import DatasetDecorator`\n", " 2. Define your extension class: e.g., `class EnsembleFX(DatasetDecorator):`\n", " 3. Define your class methods (and possibly generic functions)\n", " 4. Whenever a function returns an object of your new class, `_rewrap()` the newly generated dataset\n", "\n", "```" ] }, { "cell_type": "markdown", "id": "27c932fc", "metadata": {}, "source": [ "Regarding 1. and 2.)\n", "\n", " * It is important that you define your class as a *subclass* of the [`DatasetDecorator`](../../api/_generated/xsnow.core). This allows `xsnow` to *configure* your class to feel and behave like an `xsnowDataset`, while allowing multiple extensions to be enchained in custom order.\n", "\n", "Regarding 3.)\n", "\n", " * Code all functionality you need and want. Prepend private methods or helper functions with an underscore (e.g., `_my_private_helper`).\n", "\n", " Regarding 4.)\n", "\n", " * Rewrapping is important to ensure different extensions can be enchained. Use the pattern `xs_out = self._rewrap(xs_modified)`.\n", "\n", " *Scenario-2-extensions* can look quite different. Therefore, the next two sections demonstrate two extensions that extend the `xsnow` functionality in their own ways." ] }, { "cell_type": "markdown", "id": "5e653a55", "metadata": {}, "source": [ "## Example: Ensemble-forecast extension---new data streams and dimensions\n", "\n", "The [ensemble forecasts extension](../../api/_generated/xsnow.extensions.ensemble_forecasts) aims to facilitate research on the performance of forecasts with different lead times and from different model realizations such as deterministic or ensemble members. As such, it provides a special read routine that parses a defined directory structure into the dimensions `realization` and model `run`. This read routine is actually the heart of the extension, while the `EnsembleFx` class does not do anything except put its label onto the resulting dataset for consistent naming.\n", "\n", "### Excerpt from implementation" ] }, { "cell_type": "code", "execution_count": 1, "id": "aaed3661", "metadata": { "tags": [ "skip-execution", "remove-output" ] }, "outputs": [ { "ename": "NameError", "evalue": "name 'Union' is not defined", "output_type": "error", "traceback": [ "\u001b[31m---------------------------------------------------------------------------\u001b[39m", "\u001b[31mNameError\u001b[39m Traceback (most recent call last)", "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[1]\u001b[39m\u001b[32m, line 21\u001b[39m\n\u001b[32m 4\u001b[39m \u001b[38;5;250m \u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m 5\u001b[39m \u001b[33;03m Decorator that enriches an xsnowDataset with run context and leadtimes.\u001b[39;00m\n\u001b[32m 6\u001b[39m \n\u001b[32m (...)\u001b[39m\u001b[32m 13\u001b[39m \u001b[33;03m All existing xsnowDataset API remains available via inheritance.\u001b[39;00m\n\u001b[32m 14\u001b[39m \u001b[33;03m \"\"\"\u001b[39;00m\n\u001b[32m 15\u001b[39m \u001b[38;5;66;03m# note that the class is basically empty (no methods, etc)\u001b[39;00m\n\u001b[32m 16\u001b[39m \n\u001b[32m 17\u001b[39m \n\u001b[32m 18\u001b[39m \n\u001b[32m 19\u001b[39m \u001b[38;5;66;03m# note that this is not a class method, but a module-level function\u001b[39;00m\n\u001b[32m 20\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mread_ensemble_fx\u001b[39m(\n\u001b[32m---> \u001b[39m\u001b[32m21\u001b[39m source: \u001b[43mUnion\u001b[49m[\u001b[38;5;28mstr\u001b[39m, Path],\n\u001b[32m 22\u001b[39m \u001b[38;5;66;03m# < more parameters >\u001b[39;00m\n\u001b[32m 23\u001b[39m ) -> Optional[EnsembleFX]:\n\u001b[32m 24\u001b[39m \u001b[38;5;250m \u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m 25\u001b[39m \u001b[33;03m Read a forecast collection into an ``EnsembleFX`` dataset.\u001b[39;00m\n\u001b[32m 26\u001b[39m \n\u001b[32m (...)\u001b[39m\u001b[32m 40\u001b[39m \u001b[33;03m Decorated dataset when data were found; otherwise ``None``.\u001b[39;00m\n\u001b[32m 41\u001b[39m \u001b[33;03m \"\"\"\u001b[39;00m\n\u001b[32m 43\u001b[39m \u001b[38;5;66;03m# < iterate through directory tree and read >\u001b[39;00m\n\u001b[32m 44\u001b[39m \n\u001b[32m 45\u001b[39m \u001b[38;5;66;03m# < concatenate individual datasets >\u001b[39;00m\n", "\u001b[31mNameError\u001b[39m: name 'Union' is not defined" ] } ], "source": [ "from xsnow import DatasetDecorator\n", "\n", "class EnsembleFX(DatasetDecorator):\n", " \"\"\"\n", " Decorator that enriches an xsnowDataset with run context and leadtimes.\n", "\n", " Dimensions/coordinates guaranteed after ``read_ensemble_fx``:\n", " - ``run`` (string) with attrs including optional ``timezone``.\n", " - ``realization`` (string) describing the ensemble member label.\n", " - ``run_start`` coordinate on ``run`` (datetime64[ns], tz in attrs; NaT if unknown).\n", " - ``leadtime`` coordinate on ``time`` (float hours from run_start to valid time).\n", "\n", " All existing xsnowDataset API remains available via inheritance.\n", " \"\"\"\n", " # note that the class is basically empty (no methods, etc)\n", "\n", "\n", "\n", "# note that this is not a class method, but a module-level function\n", "def read_ensemble_fx(\n", " source: Union[str, Path],\n", " # < more parameters >\n", ") -> Optional[EnsembleFX]:\n", " \"\"\"\n", " Read a forecast collection into an ``EnsembleFX`` dataset.\n", "\n", " Layout: ``source/{run}/{member}/{station}.{smet|pro|nc}``. Runs and members are\n", " derived from folder names; station IDs from filenames. Only requested runs,\n", " members, and filename bases are read to keep I/O minimal.\n", "\n", " < ... >\n", "\n", " Parameters\n", " ----------\n", " < ... >\n", "\n", " Returns\n", " -------\n", " EnsembleFX or None\n", " Decorated dataset when data were found; otherwise ``None``.\n", " \"\"\"\n", " \n", " # < iterate through directory tree and read >\n", "\n", " # < concatenate individual datasets >\n", "\n", " xs_out = EnsembleFX(xs_combined)\n", "\n", " return xs_out\n" ] }, { "cell_type": "markdown", "id": "e547957e", "metadata": {}, "source": [ "### Demo application" ] }, { "cell_type": "code", "execution_count": 2, "id": "8c08efb3", "metadata": {}, "outputs": [], "source": [ "import xsnow\n", "from xsnow.extensions.ensemble_forecasts import read_ensemble_fx\n", "\n", "datapath = xsnow.sample_data.snp_ensfx_dir()" ] }, { "cell_type": "code", "execution_count": 3, "id": "d3f83b41", "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Data location: /home/flo/.cache/xsnow-snp-ensfx\n", "xsnow-snp-ensfx/\n", " smets/\n", " ens-fx/\n", " analysis/\n", " det/\n", " VIR1A.smet\n", " VIR2A.smet\n", " 2024-01-17T09Z/\n", " det/\n", " VIR1A.smet\n", " VIR2A.smet\n", " p01/\n", " VIR1A.smet\n", " VIR2A.smet\n", " 2024-01-16T09Z/\n", " det/\n", " VIR1A.smet\n", " VIR2A.smet\n" ] } ], "source": [ "# cell hidden through metadata\n", "import os\n", "print(f\"Data location: {datapath}\")\n", "for root, dirs, files in os.walk(datapath):\n", " level = root.replace(datapath, \"\").count(os.sep)\n", " indent = \" \" * 4 * level\n", " print(f\"{indent}{os.path.basename(root)}/\")\n", " subindent = \" \" * 4 * (level + 1)\n", " fcounter = 0\n", " for f in sorted(files):\n", " if fcounter < 3 or fcounter > len(files)-3:\n", " print(f\"{subindent}{f}\")\n", " elif fcounter == 3:\n", " print(f\"{subindent}...\")\n", " fcounter += 1\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f2148bee", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " Locations: 2\n", " Timestamps: 358 (2024-01-16--2024-01-31)\n", " Profiles: 4296 total | 0 valid | unavailable with HS>0\n", "\n", " employing the Size: 304kB\n", " Dimensions: (location: 2, time: 358, slope: 1, realization: 2, run: 3)\n", " Coordinates:\n", " altitude (location) float64 16B 2.372e+03 1.749e+03\n", " latitude (location) float64 16B 47.15 47.44\n", " * location (location) object 16B 'VIR1A' 'VIR2A'\n", " longitude (location) float64 16B 11.19 11.29\n", " leadtime (time, run) float64 9kB nan nan nan nan ... nan nan nan nan\n", " * time (time) datetime64[ns] 3kB 2024-01-16T03:00:00 ... 2024-01-31\n", " azimuth (slope) float64 8B nan\n", " inclination (slope) float64 8B nan\n", " * slope (slope) int64 8B 0\n", " * realization (realization) object 16B 'det' 'p01'\n", " * run (run) object 24B '2024-01-16T09Z' ... 'analysis'\n", " run_start (run) datetime64[ns] 24B 2024-01-16T09:00:00 ... NaT\n", " Data variables:\n", " DW (location, time, slope, realization, run) float64 34kB na...\n", " ISWR (location, time, slope, realization, run) float64 34kB na...\n", " PSUM (location, time, slope, realization, run) float64 34kB na...\n", " RH (location, time, slope, realization, run) float64 34kB na...\n", " TA (location, time, slope, realization, run) float64 34kB na...\n", " TAU_CLD (location, time, slope, realization, run) float64 34kB na...\n", " VW (location, time, slope, realization, run) float64 34kB na...\n", " VW_MAX (location, time, slope, realization, run) float64 34kB na...\n", " profile_status (location, time, slope, realization, run) float32 17kB na...\n", " Attributes:\n", " Conventions: CF-1.8\n", " crs: EPSG:4326\n", "Frozen({'location': 2, 'time': 358, 'slope': 1, 'realization': 2, 'run': 3})\n" ] } ], "source": [ "xs = read_ensemble_fx(f\"{datapath}/smets/ens-fx/\")\n", "print(xs)" ] }, { "cell_type": "markdown", "id": "ab76cf2d", "metadata": {}, "source": [ "The resulting dataset is of class `EnsembleFx`. It has an additional dimension `run` with 3 entries. Two additional coordinates were added, `run_start`: dimension (run) and `leadtime`: dimension (time, run). You can now work with the dataset as you know it from an `xsnowDataset`. \n", "\n", "For example, we could look into the first 100 values of `'TA'` and `'leadtime'` for the *deterministic* member of the *2024-01-16TZ* run at the first location:" ] }, { "cell_type": "code", "execution_count": 28, "id": "72ddedb5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ nan nan nan nan nan nan 264.28 265.39 266.14 266.7\n", " 267.11 267.55 267.83 267.93 267.78 267.61 267.79 267.89 268.16 268.61\n", " 269.12 269.48 269.72 269.99 270.52 270.58 270.1 269.25 270.03 272.68\n", " 273.89 274.46 274.91 275.49 275.3 274.4 273.82 273.22 272.24 271.08\n", " 271.16 271.88 273.1 272.81 271.62 271.51 271.72 271.81 271.49 271.55\n", " 271.54 271.7 272.01 272.42 272.83 272.95 273.3 273.98 274.13 273.91\n", " 273.35 272.19 271.77 271.35 270.84 268.75 267.23 265.45 263.89 262.66\n", " 261.81 261.06 260.68 260.13 259.72 259.38 259.1 258.85 nan nan\n", " nan nan nan nan nan nan nan nan nan nan\n", " nan nan nan nan nan nan nan nan nan nan]\n", "[nan nan nan nan nan nan 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.\n", " 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29.\n", " 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47.\n", " 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65.\n", " 66. 67. 68. 69. 70. 71. nan nan nan nan nan nan nan nan nan nan nan nan\n", " nan nan nan nan nan nan nan nan nan nan]\n" ] } ], "source": [ "sub = xs.sel(run=\"2024-01-16T09Z\", realization='det').\\\n", " isel(location=0, time=slice(100)).squeeze()\n", "\n", "print(sub['TA'].values)\n", "print(sub['leadtime'].values)" ] }, { "cell_type": "markdown", "id": "5f522e84", "metadata": {}, "source": [ "## Example: Hazard chart extension---entirely new functionality\n", "\n", "```{warning}\n", "\n", "Coming soon. In the meantime, you can checkout the source code of the [Hazard chart extension](../../api/_generated/xsnow.extensions.hazard_chart) directly. \n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "xsnow-dev", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.5" } }, "nbformat": 4, "nbformat_minor": 5 }