Note: Since writing this, I've been pointed to some exciting new research/tooling called Project Cambria https://www.inkandswitch.com/cambria.html I'll likely have to rewrite this article taking that into account. Leaving this up for posterity's sake.
(This series isn't meant to be a primer/tutorial, though we might do something regarding it in the future. For official documentation and starters, see https://developers.cloudflare.com/workers/learning/using-durable-objects.
Further - these are my personal views; I expect to be wrong about a lot of them. Indeed, I'm not paying much attention to presenting these well at the moment, simply writing down thoughts. As such, expect these writeups to change often, particularly as the platform takes shape. I'm also mostly a front end guy, so don't get mad if I get it very wrong. Give me feedback! Always happy to learn and make changes.)
Durable Objects are a fascinating new storage primitive from cloudflare for their workers platform. There's a lot of 'cool' shiny things that could be done with it; but before we get there, I think it's useful to figure out the basics of managing data in such a system, and what the dev workflow would be around them. Of particular interest are anlogies to traditional databases, and doing stuff like schemas/migrations, indexes, and so on.
Conceptually, every class instance's storage holds a table of records, let's say of type {id: string, field1:type1, field2:type2, ...}
. Fields could hold primitives like booleans, strings, numbers, but also arrays/objects/maps and so on. For simplicity, we'll assume they use only primitives for now, and we'll explore how to extrapolate these ideas to nested data later.
The storage api is fairly minimal; basic apis like get(key)
/put(key, value)
/delete(key)
/list(options)
are available. These are powerful in that they give you cpmplete raw access to the underlying data, but anything complex must be implemeted on top of these apis.
Of note, there aren't any indexing operations, so to perform queries on large data sets (similar to WHERE sql queries), you have a few options:
- loop over every item in the table for every incoming query
- create an in-memory index, stored as an instance variable
- use another durable object instance as an indexed table based on the field(s) of interest.
Each of these options have their own pros/cons, left as an exercise to the reader. That said, don't forget to consider worker memory/cpu limits in your choices https://developers.cloudflare.com/workers/platform/limits#worker-limits
Now, it's all well and good good to simply push a DO to the platform and start accessing/using it, but requirements and code keep changing. As such, the object's "schema" will keep changing too; we should know how to make changes to these objects and the workers so that we don't break the application at any time. This is what we'd call "migrations" in database land.
(We also want to able to rollback changes, but we'll disucss that in a future article. Maybe.)
Before diving into what these operations would look like, some assumptions -
-
We're using a static type system (like typescript) to define the 'shape' of data across DO/worker boundaries. You could do these operations without static types, but I'd be worried about getting them wrong
-
We're considering that DO and worker code gets deployed separately, but we still want to write the code in a single codebase so we can preserve type signatures across the worker/DO boundary. There are probably some benefits to deploying them together that would simplify these steps, but I haven't dived into that just yet.
-
Assume that the root storage operations are available over an api (protected so only admins can access it); you'll need them to execute some of these operations. You'll likely build an admin panel for these things later too.
-
We'll probably also need some kind of primitive to 'pause' incoming requests while changes are made to the objects' storage. I'd imagine this can be accomplished with a 'lock' instance variable, but I'm likely trivialising; it might need something before the request even gets sent to the DO. Dunno just yet.
Ok, on to the operations -
This should be pretty simple.
If it's a 'computed' field, we can update the DO's type signature and add code that adds this field to reponses, without any changes to its storage. Then change the worker code to access this field. Nice.
It's it's a field that changes the storage schema, it's still simple, but slightly more involved.
- first, change the type signature of the record to include the new field.
- then, we want to update the DO code so any new rows being added adds the new field if relevant.
- then, we want to run a procedure that backfills all the previous records with this new field if required.
- only then, can we update the worker code to start reading from this new field.
This isn't too hard either.
- first, change the worker code to stop accessing the field. It would've been nice if typescript let one mark a field as 'deprecated', that would make this process a little simpler. Instead, I'd change the type during dev to remove the field, find all type violations and change the code, then restore the type.
- THEN, change the type to remove the field on the DO, as well as any code that populates that field on responses.
- If it was a computed field, you're done. If not, you can lazily run a procedure that removes this field from all rows at your convenience.
This one sounds like it would be annoying, but we can do it by defining this operation as a combinations of add/remove operations. Specifically -
- first, add a new field with the desired new name, persisting the data to this new field too.
- then, remove the previous field, and point all reads to the new field.
That should do it.
This one is likely the most annoying, where we want to use the same name for a field, but the type it refers to is now different. Avoid if possible. If it can be expressed as a combination of add/remove/rename operations on children of its nested fields, that would be good. But if it's something like "This field was a number, but now it's an array of numbers", the migration steps would likely be dependent on what the required product/user experience is. In a bunch of cases this might not be too hard, but I haven't dived into it in much detail.
Conceptually this should be fairly doable - literally loop though all records in the table and save to "somewhere else" (gestures at nothing in particular, kinda hand wavey.) Maybe it would also be useful to backup the type signatures in play at that moment, or even the entire DO code.
Once I've explored the platform a little more, we'll talk about rollbacks. It shouldn't be too complicated, but there will likely be explicit tradeoffs to be made.
In the next post, lets have a look at what querying these objects would look like, and how we'll use cloudflare's caches so we don't blow shit up. If you're lucky, we'll also play with some code. Seeya then.
(I know, I know, I should make a blog. Later.)
I wrote this in 2020 for my own understanding!