Skip to content

Instantly share code, notes, and snippets.

@meggart
Last active October 7, 2019 10:09
Show Gist options
  • Save meggart/e29e6381d9400ff789eefbccc109d6f9 to your computer and use it in GitHub Desktop.
Save meggart/e29e6381d9400ff789eefbccc109d6f9 to your computer and use it in GitHub Desktop.
DimensionalData interface proposal.md
using DimensionalData
using GeoData
# Functions to be extended
"""
    gridcoordinates(::Gridtype,x)

Returns an iterator over the coordinates of a given grid with the same shape as `x`
"""
gridcoordinates(x) = gridcoordinates(hasgrid(x),x)

"""
    gridcoordinate(::Gridtype,x,i...)

Returns the coordinate associated with the value of `x[i...]`
"""
gridcoordinate(x,i...) = gridcoordinate(hasgrid(x),x,i...)

"""
    gridbounds(::Gridtype,x)

Returns the boundaries of every grid cell of `x`. The returned value will have the size `size(x) .+ 1`  
"""
gridbounds(x) = gridbounds(hasgrid(x),x)

"""
    gridbound(::Gridtype,x,i...)

For the value at index `i` returns the upper and lower bound of the grid cell for each dimension. 
"""
gridbound(x,i...) = gridbound(hasgrid(x),x,i...)

"""
    dimnames(::Gridtype, x)

Returns the names of the dimensions of x. 
"""
dimnames(x) = dimnames(hasgrid(x),x)

"""
    dimname(::Gridtype, x, i)

Returns the name of the ith dimension of x. 
"""
dimname(x,i) = dimname(hasgrid(x),x,i)
dimname
abstract type RegularGrid{T} end

"""
    RegularProductGrid

Trait describing a regular grid along all axes. 
To implement the trait, a dims() function is required, returning a list of 
dimension objects for each axis, which provide a `name` and a `vals` method.
"""
struct RegularProductGrid{T} <: RegularGrid{T} end

"""
    Center

To be used as a type parameter to described grids that are defined by their center coordinates.  
"""
struct Center end

# Implement the trait functions for regular grids where center coordinates are provided
gridcoordinates(::RegularGrid{Center},x) = Iterators.product(map(val,dims(x))...)
gridcoordinate(::RegularGrid{Center},x,i...) = map((d,j)->val(d)[j],dims(x),i)

gridbounds(::RegularGrid{Center},x) = Iterators.product(map(i->range(first(i)-step(i)/2,last(i)+step(i)/2,length=length(i)+1),val.(dims(x)))...)
gridbound(::RegularGrid{Center},x,i...) = map((d,j)->begin v = val(d);(v[j]-step(v)/2,v[j]+step(v)/2) end,dims(x),i);

dimnames(::RegularGrid{Center},x) = map(DimensionalData.name,dims(x))
dimname(::RegularGrid{Center},x,i) = DimensionalData.name(dims(x,i))
dimname (generic function with 2 methods)
# Implemetation for DimensionalData is very simple
hasgrid(::DimensionalArray) = RegularProductGrid{Center}()
hasgrid (generic function with 1 method)
#Fallback for regular arrays
struct UnknownGrid <: RegularGrid{Center} end
#One might think about using Base.axes here to support things like OffsetArrays etc...
hasgrid(::AbstractArray) = UnknownGrid()

struct UnknownDimension{I}
    length::Int
end
DimensionalData.val(x::UnknownDimension) = range(0.5,x.length-0.5,length=x.length)
DimensionalData.name(x::UnknownDimension{I}) where I = string("x",I)
DimensionalData.dims(x::AbstractArray) = ntuple(i->UnknownDimension{i}(size(x,i)),ndims(x))
# Test this with a DimensionalData array
arrayreg = DimensionalArray(rand(20,10),(Lon(20:29),Lat(49:-1:40)));
gridcoordinates(arrayreg) |> collect
20×10 Array{Tuple{Float64,Int64},2}:
 (20.0, 49)     (20.0, 48)     (20.0, 47)     …  (20.0, 41)     (20.0, 40)   
 (20.4737, 49)  (20.4737, 48)  (20.4737, 47)     (20.4737, 41)  (20.4737, 40)
 (20.9474, 49)  (20.9474, 48)  (20.9474, 47)     (20.9474, 41)  (20.9474, 40)
 (21.4211, 49)  (21.4211, 48)  (21.4211, 47)     (21.4211, 41)  (21.4211, 40)
 (21.8947, 49)  (21.8947, 48)  (21.8947, 47)     (21.8947, 41)  (21.8947, 40)
 (22.3684, 49)  (22.3684, 48)  (22.3684, 47)  …  (22.3684, 41)  (22.3684, 40)
 (22.8421, 49)  (22.8421, 48)  (22.8421, 47)     (22.8421, 41)  (22.8421, 40)
 (23.3158, 49)  (23.3158, 48)  (23.3158, 47)     (23.3158, 41)  (23.3158, 40)
 (23.7895, 49)  (23.7895, 48)  (23.7895, 47)     (23.7895, 41)  (23.7895, 40)
 (24.2632, 49)  (24.2632, 48)  (24.2632, 47)     (24.2632, 41)  (24.2632, 40)
 (24.7368, 49)  (24.7368, 48)  (24.7368, 47)  …  (24.7368, 41)  (24.7368, 40)
 (25.2105, 49)  (25.2105, 48)  (25.2105, 47)     (25.2105, 41)  (25.2105, 40)
 (25.6842, 49)  (25.6842, 48)  (25.6842, 47)     (25.6842, 41)  (25.6842, 40)
 (26.1579, 49)  (26.1579, 48)  (26.1579, 47)     (26.1579, 41)  (26.1579, 40)
 (26.6316, 49)  (26.6316, 48)  (26.6316, 47)     (26.6316, 41)  (26.6316, 40)
 (27.1053, 49)  (27.1053, 48)  (27.1053, 47)  …  (27.1053, 41)  (27.1053, 40)
 (27.5789, 49)  (27.5789, 48)  (27.5789, 47)     (27.5789, 41)  (27.5789, 40)
 (28.0526, 49)  (28.0526, 48)  (28.0526, 47)     (28.0526, 41)  (28.0526, 40)
 (28.5263, 49)  (28.5263, 48)  (28.5263, 47)     (28.5263, 41)  (28.5263, 40)
 (29.0, 49)     (29.0, 48)     (29.0, 47)        (29.0, 41)     (29.0, 40)   
dimname(arrayreg,1)
"Longitude"
gridcoordinate(arrayreg,1,1)
(20.0, 49)
# We can access the boundaries of each grid cell as well
gridbounds(arrayreg) |> collect
21×11 Array{Tuple{Float64,Float64},2}:
 (19.7632, 49.5)  (19.7632, 48.5)  …  (19.7632, 40.5)  (19.7632, 39.5)
 (20.2368, 49.5)  (20.2368, 48.5)     (20.2368, 40.5)  (20.2368, 39.5)
 (20.7105, 49.5)  (20.7105, 48.5)     (20.7105, 40.5)  (20.7105, 39.5)
 (21.1842, 49.5)  (21.1842, 48.5)     (21.1842, 40.5)  (21.1842, 39.5)
 (21.6579, 49.5)  (21.6579, 48.5)     (21.6579, 40.5)  (21.6579, 39.5)
 (22.1316, 49.5)  (22.1316, 48.5)  …  (22.1316, 40.5)  (22.1316, 39.5)
 (22.6053, 49.5)  (22.6053, 48.5)     (22.6053, 40.5)  (22.6053, 39.5)
 (23.0789, 49.5)  (23.0789, 48.5)     (23.0789, 40.5)  (23.0789, 39.5)
 (23.5526, 49.5)  (23.5526, 48.5)     (23.5526, 40.5)  (23.5526, 39.5)
 (24.0263, 49.5)  (24.0263, 48.5)     (24.0263, 40.5)  (24.0263, 39.5)
 (24.5, 49.5)     (24.5, 48.5)     …  (24.5, 40.5)     (24.5, 39.5)   
 (24.9737, 49.5)  (24.9737, 48.5)     (24.9737, 40.5)  (24.9737, 39.5)
 (25.4474, 49.5)  (25.4474, 48.5)     (25.4474, 40.5)  (25.4474, 39.5)
 (25.9211, 49.5)  (25.9211, 48.5)     (25.9211, 40.5)  (25.9211, 39.5)
 (26.3947, 49.5)  (26.3947, 48.5)     (26.3947, 40.5)  (26.3947, 39.5)
 (26.8684, 49.5)  (26.8684, 48.5)  …  (26.8684, 40.5)  (26.8684, 39.5)
 (27.3421, 49.5)  (27.3421, 48.5)     (27.3421, 40.5)  (27.3421, 39.5)
 (27.8158, 49.5)  (27.8158, 48.5)     (27.8158, 40.5)  (27.8158, 39.5)
 (28.2895, 49.5)  (28.2895, 48.5)     (28.2895, 40.5)  (28.2895, 39.5)
 (28.7632, 49.5)  (28.7632, 48.5)     (28.7632, 40.5)  (28.7632, 39.5)
 (29.2368, 49.5)  (29.2368, 48.5)  …  (29.2368, 40.5)  (29.2368, 39.5)
gridbound(arrayreg,1,1)
((19.763157894736842, 20.236842105263158), (49.5, 48.5))
# Adn test the fallback for a regular array
gridcoordinates(rand(10,10)) |> collect
10×10 Array{Tuple{Float64,Float64},2}:
 (0.5, 0.5)  (0.5, 1.5)  (0.5, 2.5)  …  (0.5, 7.5)  (0.5, 8.5)  (0.5, 9.5)
 (1.5, 0.5)  (1.5, 1.5)  (1.5, 2.5)     (1.5, 7.5)  (1.5, 8.5)  (1.5, 9.5)
 (2.5, 0.5)  (2.5, 1.5)  (2.5, 2.5)     (2.5, 7.5)  (2.5, 8.5)  (2.5, 9.5)
 (3.5, 0.5)  (3.5, 1.5)  (3.5, 2.5)     (3.5, 7.5)  (3.5, 8.5)  (3.5, 9.5)
 (4.5, 0.5)  (4.5, 1.5)  (4.5, 2.5)     (4.5, 7.5)  (4.5, 8.5)  (4.5, 9.5)
 (5.5, 0.5)  (5.5, 1.5)  (5.5, 2.5)  …  (5.5, 7.5)  (5.5, 8.5)  (5.5, 9.5)
 (6.5, 0.5)  (6.5, 1.5)  (6.5, 2.5)     (6.5, 7.5)  (6.5, 8.5)  (6.5, 9.5)
 (7.5, 0.5)  (7.5, 1.5)  (7.5, 2.5)     (7.5, 7.5)  (7.5, 8.5)  (7.5, 9.5)
 (8.5, 0.5)  (8.5, 1.5)  (8.5, 2.5)     (8.5, 7.5)  (8.5, 8.5)  (8.5, 9.5)
 (9.5, 0.5)  (9.5, 1.5)  (9.5, 2.5)     (9.5, 7.5)  (9.5, 8.5)  (9.5, 9.5)
# Now an example for moving grids or grids that are not a product of some axes

"""
    IrregularGrid


Traits describing a grid where the coordinates of one or more axes change along another axis. 
Subtypes of this should directly implement the functions `gridcoordinates` as well as `gridbounds`
"""
abstract type IrregularGrid{T} end
IrregularGrid
#Example implementation of the trait for NCDatasets
using NCDatasets

# This should actually look into the dataset and find out what is defined there and in particular it might
# respect lon_bnds or time_bnds to rather define a grid through its bounds if necessary
struct GenericNCDatasetGrid <: IrregularGrid{Center} end
hasgrid(::NCDatasets.CFVariable) = GenericNCDatasetGrid()
gridcoordinates(::GenericNCDatasetGrid,x) = zip(NCDatasets.coord(v,"longitude"),NCDatasets.coord(v,"latitude"))
WARNING: using NCDatasets.dimnames in module Main conflicts with an existing identifier.





gridcoordinates (generic function with 3 methods)
ds = Dataset("/home/fgans/Downloads/ocean_his.nc")
v = ds["sustr"]
gridcoordinates(v) |> collect
5×6 Array{Tuple{Union{Missing, Dates.DateTime, AbstractCFDateTime, Number},Union{Missing, Dates.DateTime, AbstractCFDateTime, Number}},2}:
 (-73.1577, 38.5028)  (-73.2269, 38.5619)  …  (-73.5038, 38.7977)
 (-73.0789, 38.5592)  (-73.1482, 38.6183)     (-73.4251, 38.8539)
 (-73.0002, 38.6157)  (-73.0694, 38.6747)     (-73.3463, 38.9101)
 (-72.9214, 38.6721)  (-72.9907, 38.731)      (-73.2676, 38.9663)
 (-72.8427, 38.7284)  (-72.9119, 38.7873)     (-73.1888, 39.0224)
close(ds)
@yeesian
Copy link

yeesian commented Oct 6, 2019

This is great!

I'm not sure if it'll help, but I'll just provide some probing questions:

  1. What does the interface provide for people who implement it? (Commonly/possibly interoperability with other packages that does IO/plotting/etc)
  2. Is the interface necessary/sufficient for making the functionality in (1) possible?
  3. What are the packages that will be able to implement the interface?

@rafaqz
Copy link

rafaqz commented Oct 6, 2019

Some answers (@meggart probably has better specifics for this gist)

GeoData.jl provides plotting and some basic interoperable abstractions already (for ArchGDAL too, although a little broken still). It could provide IO and conversion between data types eventually if we can work all of this out.

This gist adds the ability to describe the grid type and dispatch on it (Regular/Irregular and more specific types, with details about how cells are measured). Which is one of the main things GeoData.jl doesn't do (at all) - its currently limited to very simple grids.

  1. There some contingencies and other requirements, eg. ArchGDAL would need to provide an array interface to completely have 1 working, although some basic indexing already works just using read (see the gdal.jl tests in GeoData.jl).

  2. Hopefully whoever wants to. I've been writing implementations for ArchGDAL, NCDatasets and a HDF5 format in GeoData.jl, but we need to work out where they will live.

@yeesian
Copy link

yeesian commented Oct 6, 2019

That's cool, thanks for writing it up!

I've just gone through the announcement of GeoData.jl and some of its commits: I don't have the bandwidth (and relevant knowledge) to participate in the design of an interface for rasters, but I'll be happy to implement it for ArchGDAL (to remove the dependency of the interface on GDAL) once you all have figured it out!

@meggart
Copy link
Author

meggart commented Oct 7, 2019

Thanks @rafaqz for replying. Yes, I think plottingwould be one target area, I would be nice to have plotting functions for different raster types. For me personally the motivation would be to make the processing tools in https://github.com/esa-esdl/ESDL.jl usable to a wider set of data, which is mainly a quite efficient mapslices and broadcast_slices on chunked out-of-core data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment