Xarray-FMCR uses Xarray datatrees to provide a standard in-memory and storage representation of Forecast Model Run Collections that can then be access via the various forecast views (best estimate/constant offset/constant time/model run).
def from_model_runs(datasets: dict[str | datetime.datetime | pd.Timestamp, xr.Dataset] | Iterable[xr.dataset]) -> datatree.DataTree:
"""
From a collection of xarray datasets, assemble a datatree of forecasts
If datasets contain a single length dimension of `forecast_reference_time`, or an attribute as such, they can be passed in as an `Iterable`, otherwise a dictionary mapping `forecast_reference_time` to datasets can be passed in. Dimensions other than time are expected to match.
Returns a datatree with datasets structured as `model_run/{forecast_reference_time}` and a summary dataset at `model_run/` of all forecast_reference_times
"""
In [10]: import xarray_fmrc
In [11]: dt = xarray_fmrc.from_model_runs([run0, run1])
In [12]: dt
Out[12]:
DataTree('None', parent=None)
│ Dimensions: (forecast_reference_time: 2)
│ Coordinates:
│ * forecast_reference_time (forecast_reference_time) datetime '20230101' '20230102'
└── DataTree('model_runs')
├── DataTree('20230101')
│ Dimensions: (forecast_reference_time: 1, x: 2, y: 3)
│ Coordinates:
│ * forecast_reference_time (forecast_reference_time) datetime '20230101'
│ * x (x) int64 10 20
│ Dimensions without coordinates: y
│ Data variables:
│ foo (x, y) float64 0.4691 -0.2829 -1.509 -1.136 1.212 -0.1732
│ bar (x) int64 1 2
│ baz float64 3.142
├── DataTree('20230102')
│ Dimensions: (forecast_reference_time: 1, x: 2, y: 3)
│ Coordinates:
│ * forecast_reference_time (forecast_reference_time) datetime '20230102'
│ * x (x) int64 10 20
│ Dimensions without coordinates: y
│ Data variables:
│ foo (x, y) float64 0.4691 -0.2829 -1.509 -1.136 1.212 -0.1732
│ bar (x) int64 1 2
│ baz float64 3.142
The various views are explained in more detail below, but each has a method on the .fmrc
accessor that returns a dataset.
dt.fmrc.model_run(dt: str | datetime.datetime | pd.Timestamp) -> xr.Dataset
dt.fmrc.constant_forecast(dt: str | datetime.datetime | pd.Timestamp) -> xr.Dataset
dt.fmrc.constant_offset(offset: str | datetime.timedelta | pd.TimeOffset?) -> xr.Dataset
dt.fmrc.best() -> xr.Dataset
In [13]: dt
Out[13]: dt.fmrc.best()
Dataset...
Kerchunk has the ability to break down chunks into smaller chunks. Xarray-FMRC could provide utilities to take a collection of kerchunk files, break them apart, and rebuild them in the various FMRC views.
Xpublish-FMRC provides new endpoints for xpublish servers to serve forecast model run collections.
This uses the plugin interface to create a new top level path, and then other dataset plugins to serve various forecast views. For each dataset plugin registered below it, it overrides the get_dataset
function.
forecasts/gfs/best/edr/position
forecasts/gfs/model_run/20230101/edr/position
forecasts/gfs/constant_forecast/20230101/edr/position
forecasts/gfs/constant_offset/6h/edr/position
There may be a better name for these, but my brain is currently comparing them to database views.
Definitions pulled from http://www.unidata.ucar.edu/staff/caron/presentations/FmrcPoster.pdf
The RUC model is run hourly, and 12 runs are show in this collection; note that different runs contain forecast hours. The complete results for a single ru model run dataset. The selected example here is the run made on 2006-12-11 06:00 Z, having forecasts at 0,1,2,3,4,5,6,7,8,9 and 12 hours.
A constant forecast dataset is created from all data that have the same forecast/valid time. Using the 0 hour analysis as the best state estimate, one can use this dataset to evaluate how accurate forecasts are.
The selected example here is for the forecast time 2006-12-11 12:00 Z, using forecasts from the runs made at 0, 3, 6, 9, 10, 11, and 12 Z. There are a total of 24 such datasets in this collection.
A constant offset dataset is created from all the data that have the same offset time. This collection has 11 such datasets: the 0, 1, 2, 3, 4, 5, 5, 6, 8, 9, and 12 hour offsets.
The selected example here is for the 6 hour offset using forecast from the runs made at 0, 3, 6, 9, and 12 Z.
For each forecast time in the collection, the best estimate for that hour is used to create the best estimate dataset, which covers the entire time range of the collection.
For this example, the best estimate is the 0 hour analysis from each run, plus all the forecasts from the latest run.