I'm hunting for the best solution on how to handle keeping large sets of DB records "sorted" in a performant manner.
Most of us have work on projects at some point where we have needed to have ordered lists of objects. Whether it be a to-do list sorted by priority, or a list of documents that a user can sort in whatever order they want.
A traditional approach for this on a Rails project is to use something like the acts_as_list
gem, or something similar. These systems typically add some sort of "postion" or "sort order" column to each record, which is then used when querying out the records in a traditional order by position
SQL query.
This approach seems to work fine for smaller datasets, but can be hard to manage on large data sets with hundreds (or thousands) of records needing to be sorted. Changing the sort position of even a single object will require updating every single record in the database that is in the same sort group. This requires potentially thousands of writes to the database for every sort order update.
One typical alternative approach is to store a single, serialized column that contains the full sort order (for instance a serialized array column that contains the ordered list of record ids). Using this approach, we are now required to query the records out from the database using this specified order.
Ideally, it would be nice if the database just used the order of the in clause so a query of select * from foo where id in (67,23,1362,24)
returned the records in that order, but I'm not aware of a database that does this.
MySQL has a built-in ORDER BY Field()
clause where you could do something like ORDER BY FIELD(67,23,1362,24)
PostgreSQL doesn't seem to have any similar functions. There are two somewhat widely discussed solutions that I have come across.
- Using the PostgreSQL
CASE
clause like the following:
SELECT * FROM foo ORDER BY CASE WHEN id=67 THEN 1 WHEN id=23 THEN 2 WHEN id=1362 THEN 3 WHEN id=24 THEN 4 END
- Using a custom
JOIN
table like the following:
SELECT * FROM foo JOIN (VALUES (67,1), (23,2), (1362,3), (24,4)) as x(id, ordering) ON foo.id = x.id ORDER BY x.ordering
I'm currently using solution #2 as it was easier to implement as a module that I can include in any AR::Base
class and have it work well with other scopes.
I'm wondering if there is an obvious solution I'm just missing, or any alternatives or alternate implementations of the db schema to begin with that might make this a non-issue.
What are the best ways to keep a large set of data sorted according to user specified order with PostgreSQL?
For this particular use case that I'm dealing with, we allow sorting more than one object at a time. The objects (images) are laid out in a grid on the UI and the user can select a single image to reorder via drag and drop, or they can select multiple images to be reordered via drag and drop at the same time. This makes using something like the acts_as_list#insert_at
method less useful. I suppose we could call #insert_at
or manually #update_attribute(:position, N)
one at a time for each item that was moved in position, but I'd like to do it all in one fell swoop if possible.
The solution mentioned by @david is implemented in https://github.com/mixonic/ranked-model ...and a blog post about it: http://www.getharvest.com/blog/2011/02/a-new-rails-positioning-library-rankedmodel/