abkfenris/xarray-fmrc.md

## xarray-fmrc.md

      
    Raw
  

              xarray-fmrc.md
            
          
    Xarray-FMRC

Xarray-FMCR uses Xarray datatrees to provide a standard in-memory and storage representation of Forecast Model Run Collections that can then be access via the various forecast views (best estimate/constant offset/constant time/model run).
def from_model_runs(datasets: dict[str | datetime.datetime | pd.Timestamp, xr.Dataset] | Iterable[xr.dataset]) -> datatree.DataTree:
    """
    From a collection of xarray datasets, assemble a datatree of forecasts

    If datasets contain a single length dimension of `forecast_reference_time`, or an attribute as such, they can be passed in as an `Iterable`, otherwise a dictionary mapping `forecast_reference_time` to datasets can be passed in. Dimensions other than time are expected to match.

    Returns a datatree with datasets structured as `model_run/{forecast_reference_time}` and a summary dataset at `model_run/` of all forecast_reference_times
    """
In [10]: import xarray_fmrc

In [11]: dt = xarray_fmrc.from_model_runs([run0, run1])

In [12]: dt
Out[12]: 
DataTree('None', parent=None)
│   Dimensions:  (forecast_reference_time: 2)
│   Coordinates:
│     * forecast_reference_time   (forecast_reference_time) datetime '20230101' '20230102'
└── DataTree('model_runs')
    ├── DataTree('20230101')
    │       Dimensions:  (forecast_reference_time: 1, x: 2, y: 3)
    │       Coordinates:
    │         * forecast_reference_time (forecast_reference_time) datetime '20230101'
    │         * x        (x) int64 10 20
    │       Dimensions without coordinates: y
    │       Data variables:
    │           foo      (x, y) float64 0.4691 -0.2829 -1.509 -1.136 1.212 -0.1732
    │           bar      (x) int64 1 2
    │           baz      float64 3.142
    ├── DataTree('20230102')
    │       Dimensions:  (forecast_reference_time: 1, x: 2, y: 3)
    │       Coordinates:
    │         * forecast_reference_time (forecast_reference_time) datetime '20230102'
    │         * x        (x) int64 10 20
    │       Dimensions without coordinates: y
    │       Data variables:
    │           foo      (x, y) float64 0.4691 -0.2829 -1.509 -1.136 1.212 -0.1732
    │           bar      (x) int64 1 2
    │           baz      float64 3.142

Forecast views

The various views are explained in more detail below, but each has a method on the .fmrc accessor that returns a dataset.

dt.fmrc.model_run(dt: str | datetime.datetime | pd.Timestamp) -> xr.Dataset
dt.fmrc.constant_forecast(dt: str | datetime.datetime | pd.Timestamp) -> xr.Dataset
dt.fmrc.constant_offset(offset: str | datetime.timedelta | pd.TimeOffset?) -> xr.Dataset
dt.fmrc.best() -> xr.Dataset

In [13]: dt
Out[13]: dt.fmrc.best()
Dataset...

Kerchunk

Kerchunk has the ability to break down chunks into smaller chunks. Xarray-FMRC could provide utilities to take a collection of kerchunk files, break them apart, and rebuild them in the various FMRC views.
Xpublish-FMRC

Xpublish-FMRC provides new endpoints for xpublish servers to serve forecast model run collections.
This uses the plugin interface to create a new top level path, and then other dataset plugins to serve various forecast views. For each dataset plugin registered below it, it overrides the get_dataset function.

forecasts/gfs/best/edr/position
forecasts/gfs/model_run/20230101/edr/position
forecasts/gfs/constant_forecast/20230101/edr/position
forecasts/gfs/constant_offset/6h/edr/position

FMRC Dataset View definitions

There may be a better name for these, but my brain is currently comparing them to database views.
Definitions pulled from http://www.unidata.ucar.edu/staff/caron/presentations/FmrcPoster.pdf
Model Run Datasets

The RUC model is run hourly, and 12 runs are show
in this collection; note that different runs contain
forecast hours. The complete results for a single ru
model run dataset.
The selected example here is the run made on
2006-12-11 06:00 Z, having forecasts at
0,1,2,3,4,5,6,7,8,9 and 12 hours.
Constant forecast/valid time dataset

A constant forecast dataset is created from all data that have the same forecast/valid time. Using the 0 hour analysis as the best state estimate, one can use this dataset to evaluate how accurate forecasts are.
The selected example here is for the forecast time 2006-12-11 12:00 Z, using forecasts from the runs made at 0, 3, 6, 9, 10, 11, and 12 Z. There are a total of 24 such datasets in this collection.
Constant forecast offset datasets

A constant offset dataset is created from all the data that have the same offset time. This collection has 11 such datasets: the 0, 1, 2, 3, 4, 5, 5, 6, 8, 9, and 12 hour offsets.
The selected example here is for the 6 hour offset using forecast from the runs made at 0, 3, 6, 9, and 12 Z.
Best estimate dataset

For each forecast time in the collection, the best estimate for that hour is used to create the best estimate dataset, which covers the entire time range of the collection.
For this example, the best estimate is the 0 hour analysis from each run, plus all the forecasts from the latest run.