Handling timezones#
Although dealing with timezones is no rocket science, it can still be nerv-wrecking and time-consuming. Therefore, xsnow ships a variety of built-in functionalities that make your life very convenient. This tutorial demonstrates a number of common use cases.
The time coordinate and TimezoneAccessor
The time coordinate of an xsnowDataset is of data type datetime64, which is timezone-naive. To keep track of the timezone, xsnow maintains an attribute under the time coordinate (.attrs['timezone']).
Every coordinate or data variable of data type datetime64 can apply the functionality provided by xsnow’s TimezoneAccessor (tz). Call it like xsnowDataset.time.tz.*. See the cheat sheet and detailed examples below.
Task |
Method |
|---|---|
Inspect and change timezone metadata |
|
Convert timezone (naive data type) |
|
Select a time window in a different timezone |
|
Get a timezone-aware Pandas object |
|
import xsnow
import xarray # deliberately no alias to emphasize later on
import pandas as pd
# Demo dataset: 1 location, 1 slope, 1 realization, many timesteps
xs = xsnow.single_profile_timeseries()
print(xs.time)
[i] xsnow.xsnow_io: Loading 2 datasets eagerly with 1 workers...
[i] xsnow.utils: Slope coordinate 'inclination' varies by location. Preserving (location, slope) dimensions as allow_per_location=True.
[i] xsnow.utils: Slope coordinate 'azimuth' varies by location. Preserving (location, slope) dimensions as allow_per_location=True.
<xarray.DataArray 'time' (time: 381)> Size: 3kB
array(['2024-01-17T16:00:00.000000000', '2024-01-17T17:00:00.000000000',
'2024-01-17T18:00:00.000000000', ..., '2024-02-02T10:00:00.000000000',
'2024-02-02T11:00:00.000000000', '2024-02-02T12:00:00.000000000'],
shape=(381,), dtype='datetime64[ns]')
Coordinates:
* time (time) datetime64[ns] 3kB 2024-01-17T16:00:00 ... 2024-02-02T12:...
Attributes:
timezone: UTC+01:00
Inspect and set timezone metadata#
The timezone is displayed during the summary print of the time coordinate (see above). If you want to explicitly retrieve a timezone string, you can do the following:
xs.time.tz.get_attr() # preferred to xs.time.attrs['timezone']
'UTC+01:00'
Usually, the timezone is set automatically while reading your data. If you want to change the timezone metadata—without changing the values of the time coordinate—, use tz.set_attr.
xs.time.tz.set_attr("Europe/Vienna") # this does not modify the values in xs.time!
print(xs.time)
<xarray.DataArray 'time' (time: 381)> Size: 3kB
array(['2024-01-17T16:00:00.000000000', '2024-01-17T17:00:00.000000000',
'2024-01-17T18:00:00.000000000', ..., '2024-02-02T10:00:00.000000000',
'2024-02-02T11:00:00.000000000', '2024-02-02T12:00:00.000000000'],
shape=(381,), dtype='datetime64[ns]')
Coordinates:
* time (time) datetime64[ns] 3kB 2024-01-17T16:00:00 ... 2024-02-02T12:...
Attributes:
timezone: Europe/Vienna
Please do not alter xs.time.attrs['timezone'] directly, because tz.set_attr uses several checks to ensure that your dataset gets an actually valid timezone.
Convert to another timezone (still naive)#
The time coordinate of our sample dataset from Europe is shifted by one hour from UTC. Let’s create other data variables in our dataset that refer to UTC and Pacific Time. Both will also be timezone-naive, but maintain their timezone as attribute. Use the convenience method tz.to_naive_in:
# Create a UTC-naive mirror of the time coord (still an xarray.DataArray):
xs['time_utc'] = xs.time.tz.to_naive_in("UTC")
xs['time_pt'] = xs.time.tz.to_naive_in("America/Vancouver") # try: -8, PT
print(xs.time_utc.dtype) # datetime64[ns]
print(xs.time_utc.tz.get_attr()) # "UTC"
print(xs.time_pt.tz.get_attr()) # "America/Vancouver"
datetime64[ns]
UTC
America/Vancouver
Both new time variables can be stored conveniently in the dataset. Per default, they are xarray.DataArrays of the data type datetime64. Their timezone attributes are stored automatically and can be retrieved analogously to the time coordinate.
If you also tried the timezones tz.to_naive_in(-8) or tz.to_naive_in('PT'), you will have noticed that the former will be converted to UTC-08:00, while the latter raises an error because 'PT' is not a known timezone.
Note, that the values of the new times are in their respective timezones, while their time coordinates still point to the common dataset coordinate time. Hence, they can be displayed as such:
xs[['time', 'time_utc', 'time_pt']].to_dataframe()
| time_utc | time_pt | |
|---|---|---|
| time | ||
| 2024-01-17 16:00:00 | 2024-01-17 15:00:00 | 2024-01-17 07:00:00 |
| 2024-01-17 17:00:00 | 2024-01-17 16:00:00 | 2024-01-17 08:00:00 |
| 2024-01-17 18:00:00 | 2024-01-17 17:00:00 | 2024-01-17 09:00:00 |
| 2024-01-17 19:00:00 | 2024-01-17 18:00:00 | 2024-01-17 10:00:00 |
| 2024-01-17 20:00:00 | 2024-01-17 19:00:00 | 2024-01-17 11:00:00 |
| ... | ... | ... |
| 2024-02-02 08:00:00 | 2024-02-02 07:00:00 | 2024-02-01 23:00:00 |
| 2024-02-02 09:00:00 | 2024-02-02 08:00:00 | 2024-02-02 00:00:00 |
| 2024-02-02 10:00:00 | 2024-02-02 09:00:00 | 2024-02-02 01:00:00 |
| 2024-02-02 11:00:00 | 2024-02-02 10:00:00 | 2024-02-02 02:00:00 |
| 2024-02-02 12:00:00 | 2024-02-02 11:00:00 | 2024-02-02 03:00:00 |
381 rows × 2 columns
We can easily test that the new time_utc is a simple shift by one hour:
xarray.testing.assert_equal(xs.time_utc, xs.time + pd.to_timedelta(-1, unit='hours'))
Reassigning the time coordinate
To reassign your dataset’s time coordinate beyond simply adding another data variable in another timezone, do the following:
time_utc = ds["time"].tz.to_naive_in("UTC")
xs_converted = xs.assign_coords(time=time_utc)
Select a time window in a different timezone#
It may happen that you decided to store your data in a specific timezone, but want to extract data from a time window expressed in another timezone without changing the actual time coordinate. For example, to extract three timestamps during the Japan morning of Jan 18th from our sample dataset in the European timezone, you can do:
mask = xs.time.tz.between("2024-01-18 08:00", "2024-01-18 10:00", tz="Asia/Tokyo")
xs_morning_Japan = xs.sel(time=mask)
print(xs_morning_Japan.time)
<xarray.DataArray 'time' (time: 3)> Size: 24B
array(['2024-01-18T00:00:00.000000000', '2024-01-18T01:00:00.000000000',
'2024-01-18T02:00:00.000000000'], dtype='datetime64[ns]')
Coordinates:
* time (time) datetime64[ns] 24B 2024-01-18 ... 2024-01-18T02:00:00
Attributes:
timezone: Europe/Vienna
tz.between extracted the three timestamps from Europe/Vienna that correspond to the 8-hour-shifted morning in Japan.
Obtain a timezone-aware Pandas object#
While xarray.DataArrays are timezone-naive, Pandas implements data types that are indeed timezone-aware. To express a timezone-naive DataArray as Pandas.DatetimeIndex or Pandas.Series, use the methods tz.localize or tz.to_datetimeindex_in.
This can be useful when you store your data naively in one timezone, but want to create timeseries plots in another timezone and explicitly display timezone-aware labels.
Note, that timezones like "Europe/Vienna" go through daylight saving time changes. Therefore, there will be datetimes that are non-existent or ambiguous (non-unique). Consult the documentation of these methods for more detail on how to handle these edge cases.
idx_vienna = xs["time"].tz.localize() # DatetimeIndex aware of Europe/Vienna
print(idx_vienna)
DatetimeIndex(['2024-01-17 16:00:00+01:00', '2024-01-17 17:00:00+01:00',
'2024-01-17 18:00:00+01:00', '2024-01-17 19:00:00+01:00',
'2024-01-17 20:00:00+01:00', '2024-01-17 21:00:00+01:00',
'2024-01-17 22:00:00+01:00', '2024-01-17 23:00:00+01:00',
'2024-01-18 00:00:00+01:00', '2024-01-18 01:00:00+01:00',
...
'2024-02-02 03:00:00+01:00', '2024-02-02 04:00:00+01:00',
'2024-02-02 05:00:00+01:00', '2024-02-02 06:00:00+01:00',
'2024-02-02 07:00:00+01:00', '2024-02-02 08:00:00+01:00',
'2024-02-02 09:00:00+01:00', '2024-02-02 10:00:00+01:00',
'2024-02-02 11:00:00+01:00', '2024-02-02 12:00:00+01:00'],
dtype='datetime64[ns, Europe/Vienna]', length=381, freq=None)
series_vienna = xs["time"].tz.localize(as_series=True)
print(series_vienna)
time
2024-01-17 16:00:00 2024-01-17 16:00:00+01:00
2024-01-17 17:00:00 2024-01-17 17:00:00+01:00
2024-01-17 18:00:00 2024-01-17 18:00:00+01:00
2024-01-17 19:00:00 2024-01-17 19:00:00+01:00
2024-01-17 20:00:00 2024-01-17 20:00:00+01:00
...
2024-02-02 08:00:00 2024-02-02 08:00:00+01:00
2024-02-02 09:00:00 2024-02-02 09:00:00+01:00
2024-02-02 10:00:00 2024-02-02 10:00:00+01:00
2024-02-02 11:00:00 2024-02-02 11:00:00+01:00
2024-02-02 12:00:00 2024-02-02 12:00:00+01:00
Freq: h, Name: time, Length: 381, dtype: datetime64[ns, Europe/Vienna]
DatetimeIndex is just the array of tz-aware datetimes. The Series also holds those tz-aware datetimes, but its index is the original tz-naive times (the values stored in xs["time"]).
Analogously to tz.localize, you can use the method tz.to_datetimeindex_in to create a timezone-aware index or series in a different target timezone (i.e., “making tz-aware and converting timezone in one call”). Mind, that the Series’ index will reflect the original datetimes in xs["time"] but contain the converted tz-aware data values:
idx_utc = xs["time"].tz.to_datetimeindex_in("UTC", as_series=True) # Series aware of UTC
print(idx_utc)
time
2024-01-17 16:00:00 2024-01-17 15:00:00+00:00
2024-01-17 17:00:00 2024-01-17 16:00:00+00:00
2024-01-17 18:00:00 2024-01-17 17:00:00+00:00
2024-01-17 19:00:00 2024-01-17 18:00:00+00:00
2024-01-17 20:00:00 2024-01-17 19:00:00+00:00
...
2024-02-02 08:00:00 2024-02-02 07:00:00+00:00
2024-02-02 09:00:00 2024-02-02 08:00:00+00:00
2024-02-02 10:00:00 2024-02-02 09:00:00+00:00
2024-02-02 11:00:00 2024-02-02 10:00:00+00:00
2024-02-02 12:00:00 2024-02-02 11:00:00+00:00
Freq: h, Name: time, Length: 381, dtype: datetime64[ns, UTC]
Combining datasets with different timezones#
The read function, as well as combine and concat, piece individual datasets together. If those datasets have different timezones, there is ample of potential for creating silent bugs that mess up your data integrity. Therefore, xsnow has several default safety mechanisms in place that raise errors when users attempt to combine datasets with different timezones. Consult the documentation of these functions if you want to explicitly disable those safety mechanisms for a specific operation.
One example of such an error and what to do about is given below. Also consult the tutorial on Combining datasets.
try:
datapath = xsnow.sample_data.snp_snowobs_dir()
xs = xsnow.read(datapath, recursive=True)
except ValueError as exc:
print(f"ValueError: {exc}")
[i] xsnow.xsnow_io: Loading 4 datasets eagerly with 1 workers...
ValueError: Conflicting timezones: 'UTC+01:00' vs 'UTC'.
# Read individually
xs_output = xsnow.read(f"{datapath}/pros", recursive=True)
xs_input = xsnow.read(f"{datapath}/smets/snowobs/nowcast")
xs_input.append_latest(f"{datapath}/smets/snowobs/forecast")
# Convert and reassign timezones
xs_output = xs_output.assign_coords(
time=xs_output.time.tz.to_naive_in("UTC")
)
# Combine manually:
ds = xarray.merge([xs_output.data, xs_input.data],
compat="no_conflicts", combine_attrs="no_conflicts",
join="outer"
)
xs = xsnow.xsnowDataset(ds)
print(xs)
[i] xsnow.xsnow_io: Loading 2 datasets eagerly with 1 workers...
[i] xsnow.utils: Slope coordinate 'inclination' varies by location. Preserving (location, slope) dimensions as allow_per_location=True.
[i] xsnow.utils: Slope coordinate 'azimuth' varies by location. Preserving (location, slope) dimensions as allow_per_location=True.
[i] xsnow.xsnow_io: Appended 58 new timestamps: 2024-01-31T03:00:00--2024-02-02T12:00:00
<xsnowDataset>
Locations: 4
Timestamps: 417 (2024-01-16--2024-02-02)
Profiles: 1668 total | 741 valid | 741 with HS>0
employing the <xarray.Dataset> Size: 12MB
Dimensions: (location: 4, slope: 1, realization: 1,
layer: 59, time: 417)
Coordinates:
* location (location) object 32B 'Kasererwinkl-C6' ... 'Lu...
* slope (slope) int64 8B 0
* realization (realization) int64 8B 0
* layer (layer) int64 472B 0 1 2 3 4 5 ... 54 55 56 57 58
* time (time) datetime64[ns] 3kB 2024-01-16T03:00:00 ....
altitude (location) float64 32B 1.681e+03 ... 1.955e+03
azimuth (location, slope) float64 32B nan 270.0 nan 90.0
inclination (location, slope) float64 32B nan 16.0 nan 33.0
latitude (location) float64 32B 47.1 47.1 47.34 47.34
longitude (location) float64 32B 11.62 11.62 11.24 11.24
z (location, time, slope, realization, layer) float32 394kB ...
Data variables: (12/94)
ColdContentSnow (location, time, slope, realization) float64 13kB ...
DW (location, time, slope, realization) float64 13kB ...
HN12 (location, time, slope, realization) float64 13kB ...
HN24 (location, time, slope, realization) float64 13kB ...
HN3 (location, time, slope, realization) float64 13kB ...
HN6 (location, time, slope, realization) float64 13kB ...
... ...
zSn (location, time, slope, realization) float64 13kB ...
zSs (location, time, slope, realization) float64 13kB ...
HS (location, time, slope, realization) float32 7kB ...
PSUM (location, time, slope, realization) float64 13kB ...
TAU_CLD (location, time, slope, realization) float64 13kB ...
VW_MAX (location, time, slope, realization) float64 13kB ...
Attributes:
Conventions: CF-1.8
crs: EPSG:4326