{ "cells": [ { "cell_type": "markdown", "id": "b8cde7d2", "metadata": {}, "source": [ "# Handling timezones\n", "\n", "Although dealing with timezones is no rocket science, it can still be nerv-wrecking and time-consuming. Therefore, `xsnow` ships a variety of built-in functionalities that make your life very convenient. This tutorial demonstrates a number of common use cases." ] }, { "cell_type": "markdown", "id": "56ae6fa7", "metadata": {}, "source": [ "```{admonition} The time coordinate and TimezoneAccessor\n", ":class: note\n", "\n", "The time coordinate of an `xsnowDataset` is of data type `datetime64`, which is timezone-naive. To keep track of the timezone, `xsnow` maintains an attribute under the time coordinate (`.attrs['timezone']`). \n", "\n", "Every coordinate or data variable of data type `datetime64` can apply the functionality provided by `xsnow`'s `TimezoneAccessor` (`tz`). Call it like `xsnowDataset.time.tz.*`. See the cheat sheet and detailed examples below.\n", "```" ] }, { "cell_type": "markdown", "id": "d738b691", "metadata": {}, "source": [ "| Task | Method |\n", "|---------|---------|\n", "| Inspect and change timezone metadata | `.tz.get_attr()`, `.tz.set_attr()` |\n", "| Convert timezone (naive data type) | `.tz.to_naive_in()` |\n", "| Select a time window in a different timezone | `.tz.between()` |\n", "| Get a timezone-aware Pandas object | `.tz.localize()`, `.tz.to_datetimeindex_in()` |" ] }, { "cell_type": "code", "execution_count": 1, "id": "014580b0", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[i] xsnow.xsnow_io: Loading 2 datasets eagerly with 13 workers...\n", "[i] xsnow.utils: Slope coordinate 'inclination' varies by location. Preserving (location, slope) dimensions as allow_per_location=True.\n", "[i] xsnow.utils: Slope coordinate 'azimuth' varies by location. Preserving (location, slope) dimensions as allow_per_location=True.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " Size: 3kB\n", "array(['2024-01-17T16:00:00.000000000', '2024-01-17T17:00:00.000000000',\n", " '2024-01-17T18:00:00.000000000', ..., '2024-02-02T10:00:00.000000000',\n", " '2024-02-02T11:00:00.000000000', '2024-02-02T12:00:00.000000000'],\n", " shape=(381,), dtype='datetime64[ns]')\n", "Coordinates:\n", " * time (time) datetime64[ns] 3kB 2024-01-17T16:00:00 ... 2024-02-02T12:...\n", "Attributes:\n", " timezone: UTC+01:00\n" ] } ], "source": [ "import xsnow\n", "import xarray # deliberately no alias to emphasize later on\n", "import pandas as pd\n", "\n", "# Demo dataset: 1 location, 1 slope, 1 realization, many timesteps\n", "xs = xsnow.single_profile_timeseries()\n", "print(xs.time)" ] }, { "cell_type": "markdown", "id": "a8d74732", "metadata": {}, "source": [ "## Inspect and set timezone metadata\n", "The timezone is displayed during the summary print of the time coordinate (see above). If you want to explicitly retrieve a timezone string, you can do the following:" ] }, { "cell_type": "code", "execution_count": 2, "id": "b2accbc2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'UTC+01:00'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xs.time.tz.get_attr() # preferred to xs.time.attrs['timezone']" ] }, { "cell_type": "markdown", "id": "dfa5ccae", "metadata": {}, "source": [ "Usually, the timezone is set automatically while reading your data. If you want to change the timezone metadata---without changing the values of the time coordinate---, use `tz.set_attr`." ] }, { "cell_type": "code", "execution_count": 3, "id": "f79641f3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Size: 3kB\n", "array(['2024-01-17T16:00:00.000000000', '2024-01-17T17:00:00.000000000',\n", " '2024-01-17T18:00:00.000000000', ..., '2024-02-02T10:00:00.000000000',\n", " '2024-02-02T11:00:00.000000000', '2024-02-02T12:00:00.000000000'],\n", " shape=(381,), dtype='datetime64[ns]')\n", "Coordinates:\n", " * time (time) datetime64[ns] 3kB 2024-01-17T16:00:00 ... 2024-02-02T12:...\n", "Attributes:\n", " timezone: Europe/Vienna\n" ] } ], "source": [ "xs.time.tz.set_attr(\"Europe/Vienna\") # this does not modify the values in xs.time!\n", "print(xs.time)\n" ] }, { "cell_type": "markdown", "id": "69d96017", "metadata": {}, "source": [ "Please do not alter `xs.time.attrs['timezone']` directly, because `tz.set_attr` uses several checks to ensure that your dataset gets an actually valid timezone." ] }, { "cell_type": "markdown", "id": "509caa97", "metadata": {}, "source": [ "## Convert to another timezone (still naive)\n", "The time coordinate of our sample dataset from Europe is shifted by one hour from UTC. Let's create other data variables in our dataset that refer to UTC and Pacific Time. Both will also be timezone-naive, but maintain their timezone as attribute. Use the convenience method `tz.to_naive_in`:" ] }, { "cell_type": "code", "execution_count": 4, "id": "027262ff", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "datetime64[ns]\n", "UTC\n", "America/Vancouver\n" ] } ], "source": [ "# Create a UTC-naive mirror of the time coord (still an xarray.DataArray):\n", "xs['time_utc'] = xs.time.tz.to_naive_in(\"UTC\")\n", "xs['time_pt'] = xs.time.tz.to_naive_in(\"America/Vancouver\") # try: -8, PT\n", "\n", "print(xs.time_utc.dtype) # datetime64[ns]\n", "print(xs.time_utc.tz.get_attr()) # \"UTC\"\n", "print(xs.time_pt.tz.get_attr()) # \"America/Vancouver\"" ] }, { "cell_type": "markdown", "id": "4190fe1c", "metadata": {}, "source": [ "Both new time variables can be stored conveniently in the dataset. Per default, they are `xarray.DataArray`s of the data type `datetime64`. Their timezone attributes are stored automatically and can be retrieved analogously to the time coordinate. \n", "\n", "If you also tried the timezones `tz.to_naive_in(-8)` or `tz.to_naive_in('PT'`), you will have noticed that the former will be converted to `UTC-08:00`, while the latter raises an error because `'PT'` is not a known timezone. \n", "\n", "Note, that the *values* of the new times are in their respective timezones, while their *time coordinates* still point to the common dataset coordinate `time`. Hence, they can be displayed as such:" ] }, { "cell_type": "code", "execution_count": 5, "id": "6fa9366e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
time_utctime_pt
time
2024-01-17 16:00:002024-01-17 15:00:002024-01-17 07:00:00
2024-01-17 17:00:002024-01-17 16:00:002024-01-17 08:00:00
2024-01-17 18:00:002024-01-17 17:00:002024-01-17 09:00:00
2024-01-17 19:00:002024-01-17 18:00:002024-01-17 10:00:00
2024-01-17 20:00:002024-01-17 19:00:002024-01-17 11:00:00
.........
2024-02-02 08:00:002024-02-02 07:00:002024-02-01 23:00:00
2024-02-02 09:00:002024-02-02 08:00:002024-02-02 00:00:00
2024-02-02 10:00:002024-02-02 09:00:002024-02-02 01:00:00
2024-02-02 11:00:002024-02-02 10:00:002024-02-02 02:00:00
2024-02-02 12:00:002024-02-02 11:00:002024-02-02 03:00:00
\n", "

381 rows × 2 columns

\n", "
" ], "text/plain": [ " time_utc time_pt\n", "time \n", "2024-01-17 16:00:00 2024-01-17 15:00:00 2024-01-17 07:00:00\n", "2024-01-17 17:00:00 2024-01-17 16:00:00 2024-01-17 08:00:00\n", "2024-01-17 18:00:00 2024-01-17 17:00:00 2024-01-17 09:00:00\n", "2024-01-17 19:00:00 2024-01-17 18:00:00 2024-01-17 10:00:00\n", "2024-01-17 20:00:00 2024-01-17 19:00:00 2024-01-17 11:00:00\n", "... ... ...\n", "2024-02-02 08:00:00 2024-02-02 07:00:00 2024-02-01 23:00:00\n", "2024-02-02 09:00:00 2024-02-02 08:00:00 2024-02-02 00:00:00\n", "2024-02-02 10:00:00 2024-02-02 09:00:00 2024-02-02 01:00:00\n", "2024-02-02 11:00:00 2024-02-02 10:00:00 2024-02-02 02:00:00\n", "2024-02-02 12:00:00 2024-02-02 11:00:00 2024-02-02 03:00:00\n", "\n", "[381 rows x 2 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xs[['time', 'time_utc', 'time_pt']].to_dataframe()" ] }, { "cell_type": "markdown", "id": "0ec55e79", "metadata": {}, "source": [ "We can easily test that the new `time_utc` is a simple shift by one hour:" ] }, { "cell_type": "code", "execution_count": 6, "id": "9eb7f650", "metadata": {}, "outputs": [], "source": [ "xarray.testing.assert_equal(xs.time_utc, xs.time + pd.to_timedelta(-1, unit='hours'))" ] }, { "cell_type": "markdown", "id": "9d197fd4", "metadata": {}, "source": [ "```{admonition} Reassigning the time coordinate\n", ":class: note \n", "\n", "To *reassign* your dataset's `time` coordinate beyond simply adding another data variable in another timezone, do the following:\n", "\n", " time_utc = ds[\"time\"].tz.to_naive_in(\"UTC\")\n", " xs_converted = xs.assign_coords(time=time_utc)\n", "\n", "```" ] }, { "cell_type": "markdown", "id": "9ae9e6bd", "metadata": { "tags": [ "remove-input" ] }, "source": [ "## Select a time window in a different timezone\n", "It may happen that you decided to store your data in a specific timezone, but want to extract data from a time window expressed in another timezone without changing the actual time coordinate. For example, to extract three timestamps during the Japan morning of Jan 18th from our sample dataset in the European timezone, you can do:" ] }, { "cell_type": "code", "execution_count": 7, "id": "88fd5025", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Size: 24B\n", "array(['2024-01-18T00:00:00.000000000', '2024-01-18T01:00:00.000000000',\n", " '2024-01-18T02:00:00.000000000'], dtype='datetime64[ns]')\n", "Coordinates:\n", " * time (time) datetime64[ns] 24B 2024-01-18 ... 2024-01-18T02:00:00\n", "Attributes:\n", " timezone: Europe/Vienna\n" ] } ], "source": [ "mask = xs.time.tz.between(\"2024-01-18 08:00\", \"2024-01-18 10:00\", tz=\"Asia/Tokyo\")\n", "xs_morning_Japan = xs.sel(time=mask)\n", "print(xs_morning_Japan.time)" ] }, { "cell_type": "markdown", "id": "d3223607", "metadata": {}, "source": [ "`tz.between` extracted the three timestamps from Europe/Vienna that correspond to the 8-hour-shifted morning in Japan." ] }, { "cell_type": "markdown", "id": "a3d8ddf8", "metadata": {}, "source": [ "## Obtain a timezone-aware Pandas object\n", "While `xarray.DataArray`s are timezone-naive, Pandas implements data types that are indeed timezone-aware. To express a timezone-naive DataArray as `Pandas.DatetimeIndex` or `Pandas.Series`, use the methods `tz.localize` or `tz.to_datetimeindex_in`.\n", "\n", "This can be useful when you store your data naively in one timezone, but want to create timeseries plots in another timezone and explicitly display timezone-aware labels.\n", "\n", "Note, that timezones like `\"Europe/Vienna\"` go through daylight saving time changes. Therefore, there will be datetimes that are non-existent or ambiguous (non-unique). Consult the documentation of these methods for more detail on how to handle these edge cases." ] }, { "cell_type": "code", "execution_count": 8, "id": "7ac4a475", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DatetimeIndex(['2024-01-17 16:00:00+01:00', '2024-01-17 17:00:00+01:00',\n", " '2024-01-17 18:00:00+01:00', '2024-01-17 19:00:00+01:00',\n", " '2024-01-17 20:00:00+01:00', '2024-01-17 21:00:00+01:00',\n", " '2024-01-17 22:00:00+01:00', '2024-01-17 23:00:00+01:00',\n", " '2024-01-18 00:00:00+01:00', '2024-01-18 01:00:00+01:00',\n", " ...\n", " '2024-02-02 03:00:00+01:00', '2024-02-02 04:00:00+01:00',\n", " '2024-02-02 05:00:00+01:00', '2024-02-02 06:00:00+01:00',\n", " '2024-02-02 07:00:00+01:00', '2024-02-02 08:00:00+01:00',\n", " '2024-02-02 09:00:00+01:00', '2024-02-02 10:00:00+01:00',\n", " '2024-02-02 11:00:00+01:00', '2024-02-02 12:00:00+01:00'],\n", " dtype='datetime64[ns, Europe/Vienna]', length=381, freq=None)\n" ] } ], "source": [ "idx_vienna = xs[\"time\"].tz.localize() # DatetimeIndex aware of Europe/Vienna\n", "print(idx_vienna)" ] }, { "cell_type": "code", "execution_count": 9, "id": "7b6bac2f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "time\n", "2024-01-17 16:00:00 2024-01-17 16:00:00+01:00\n", "2024-01-17 17:00:00 2024-01-17 17:00:00+01:00\n", "2024-01-17 18:00:00 2024-01-17 18:00:00+01:00\n", "2024-01-17 19:00:00 2024-01-17 19:00:00+01:00\n", "2024-01-17 20:00:00 2024-01-17 20:00:00+01:00\n", " ... \n", "2024-02-02 08:00:00 2024-02-02 08:00:00+01:00\n", "2024-02-02 09:00:00 2024-02-02 09:00:00+01:00\n", "2024-02-02 10:00:00 2024-02-02 10:00:00+01:00\n", "2024-02-02 11:00:00 2024-02-02 11:00:00+01:00\n", "2024-02-02 12:00:00 2024-02-02 12:00:00+01:00\n", "Freq: h, Name: time, Length: 381, dtype: datetime64[ns, Europe/Vienna]\n" ] } ], "source": [ "series_vienna = xs[\"time\"].tz.localize(as_series=True)\n", "print(series_vienna)" ] }, { "cell_type": "markdown", "id": "ec6c53e8", "metadata": {}, "source": [ "`DatetimeIndex` is just the array of tz-aware datetimes. The `Series` also holds those tz-aware datetimes, but its index is the original tz-naive times (the values stored in `xs[\"time\"]`).\n", "\n", "Analogously to `tz.localize`, you can use the method `tz.to_datetimeindex_in` to create a timezone-aware index or series in a different target timezone (i.e., *\"making tz-aware and converting timezone in one call\"*). Mind, that the Series' index will reflect the original datetimes in `xs[\"time\"]` but contain the converted tz-aware data values:" ] }, { "cell_type": "code", "execution_count": 10, "id": "996ab043", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "time\n", "2024-01-17 16:00:00 2024-01-17 15:00:00+00:00\n", "2024-01-17 17:00:00 2024-01-17 16:00:00+00:00\n", "2024-01-17 18:00:00 2024-01-17 17:00:00+00:00\n", "2024-01-17 19:00:00 2024-01-17 18:00:00+00:00\n", "2024-01-17 20:00:00 2024-01-17 19:00:00+00:00\n", " ... \n", "2024-02-02 08:00:00 2024-02-02 07:00:00+00:00\n", "2024-02-02 09:00:00 2024-02-02 08:00:00+00:00\n", "2024-02-02 10:00:00 2024-02-02 09:00:00+00:00\n", "2024-02-02 11:00:00 2024-02-02 10:00:00+00:00\n", "2024-02-02 12:00:00 2024-02-02 11:00:00+00:00\n", "Freq: h, Name: time, Length: 381, dtype: datetime64[ns, UTC]\n" ] } ], "source": [ "idx_utc = xs[\"time\"].tz.to_datetimeindex_in(\"UTC\", as_series=True) # Series aware of UTC\n", "print(idx_utc)" ] }, { "cell_type": "markdown", "id": "c3338ae8", "metadata": {}, "source": [ "## Combining datasets with different timezones\n", "The `read` function, as well as `combine` and `concat`, piece individual datasets together. If those datasets have different timezones, there is ample of potential for creating silent bugs that mess up your data integrity. Therefore, `xsnow` has several default safety mechanisms in place that raise errors when users attempt to combine datasets with different timezones. Consult the documentation of these functions if you want to explicitly disable those safety mechanisms for a specific operation.\n", "\n", "One example of such an error and what to do about is given below. Also consult the tutorial on [Combining datasets](./combining_data.ipynb)." ] }, { "cell_type": "code", "execution_count": 11, "id": "93aad8fd", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[i] xsnow.xsnow_io: Loading 4 datasets eagerly with 13 workers...\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "ValueError: Conflicting timezones: 'UTC+01:00' vs 'UTC'.\n" ] } ], "source": [ "try:\n", " datapath = xsnow.sample_data.snp_snowobs_dir()\n", " xs = xsnow.read(datapath, recursive=True)\n", "except ValueError as exc:\n", " print(f\"ValueError: {exc}\")" ] }, { "cell_type": "code", "execution_count": 12, "id": "d82d7939", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[i] xsnow.xsnow_io: Loading 2 datasets eagerly with 13 workers...\n", "[i] xsnow.utils: Slope coordinate 'inclination' varies by location. Preserving (location, slope) dimensions as allow_per_location=True.\n", "[i] xsnow.utils: Slope coordinate 'azimuth' varies by location. Preserving (location, slope) dimensions as allow_per_location=True.\n", "[i] xsnow.xsnow_io: Appended 58 new timestamps: 2024-01-31T03:00:00--2024-02-02T12:00:00\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", " Locations: 4\n", " Timestamps: 417 (2024-01-16--2024-02-02)\n", " Profiles: 1668 total | 741 valid | 741 with HS>0\n", "\n", " employing the Size: 12MB\n", " Dimensions: (location: 4, slope: 1, realization: 1,\n", " layer: 59, time: 417)\n", " Coordinates:\n", " * location (location) object 32B 'Kasererwinkl-C6' ... 'Lu...\n", " * slope (slope) int64 8B 0\n", " * realization (realization) int64 8B 0\n", " * layer (layer) int64 472B 0 1 2 3 4 5 ... 54 55 56 57 58\n", " * time (time) datetime64[ns] 3kB 2024-01-16T03:00:00 ....\n", " altitude (location) float64 32B 1.681e+03 ... 1.955e+03\n", " azimuth (location, slope) float64 32B nan 270.0 nan 90.0\n", " inclination (location, slope) float64 32B nan 16.0 nan 33.0\n", " latitude (location) float64 32B 47.1 47.1 47.34 47.34\n", " longitude (location) float64 32B 11.62 11.62 11.24 11.24\n", " z (location, time, slope, realization, layer) float32 394kB ...\n", " Data variables: (12/94)\n", " ColdContentSnow (location, time, slope, realization) float64 13kB ...\n", " DW (location, time, slope, realization) float64 13kB ...\n", " HN12 (location, time, slope, realization) float64 13kB ...\n", " HN24 (location, time, slope, realization) float64 13kB ...\n", " HN3 (location, time, slope, realization) float64 13kB ...\n", " HN6 (location, time, slope, realization) float64 13kB ...\n", " ... ...\n", " zSn (location, time, slope, realization) float64 13kB ...\n", " zSs (location, time, slope, realization) float64 13kB ...\n", " HS (location, time, slope, realization) float32 7kB ...\n", " PSUM (location, time, slope, realization) float64 13kB ...\n", " TAU_CLD (location, time, slope, realization) float64 13kB ...\n", " VW_MAX (location, time, slope, realization) float64 13kB ...\n", " Attributes:\n", " Conventions: CF-1.8\n", " crs: EPSG:4326\n" ] } ], "source": [ "# Read individually\n", "xs_output = xsnow.read(f\"{datapath}/pros\", recursive=True)\n", "xs_input = xsnow.read(f\"{datapath}/smets/snowobs/nowcast\")\n", "xs_input.append_latest(f\"{datapath}/smets/snowobs/forecast\")\n", "\n", "# Convert and reassign timezones\n", "xs_output = xs_output.assign_coords(\n", " time=xs_output.time.tz.to_naive_in(\"UTC\")\n", ")\n", "\n", "# Combine manually:\n", "ds = xarray.merge([xs_output.data, xs_input.data],\n", " compat=\"no_conflicts\", combine_attrs=\"no_conflicts\",\n", " join=\"outer\"\n", ")\n", "xs = xsnow.xsnowDataset(ds)\n", "print(xs)" ] } ], "metadata": { "kernelspec": { "display_name": "xsnow-dev", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.5" } }, "nbformat": 4, "nbformat_minor": 5 }