I'm going to describe a weird problem I faced when using ActiveRecord today. To protect the innocent, I'm not going to talk about the app I'm actually working on but will instead discuss a hypothetical but isomorphic database design for a clone of the popular blogging platform Tumblr.
Tumblr lets you publish various different sorts of content. We might be tempted to shove all these types in a big STI table, but the types are all quite different from one another and so we give them their own tables.
class CreateContentTables < ActiveRecord::Migration
def change
create_table :texts do |t|
t.string :title
t.text :body
end
create_table :quotes do |t|
t.text :body
t.string :source
t.string :url
end
create_table :links do |t|
t.string :title
t.string :url
end
end
end
These pieces of content each belong to a blog, which belongs to a user. However
you'll notice the above tables have no foreign key for the blog. Instead, we'll
use a join table called posts
to link content items to blogs. There are
various reasons for this in my actual app; in Tumblr it gives us cheap
reblogging since we don't need to copy a content record, just create a new
posts
record. posts
has a compound foreign key referring to content items.
class CreateBlogsAndUsers < ActiveRecord::Migration
def change
create_table :users do |t|
t.string :username
end
create_table :blogs do |t|
t.belongs_to :user
t.string :title
end
create_table :posts do |t|
t.timestamps
t.belongs_to :blog
t.belongs_to :content, polymorphic: true
end
end
end
Finally, each content item can have comments, which belong to a user and just
like posts
they have a compound foreign key for the content they relate to.
class CreateComments < ActiveRecord::Migration
def change
create_table :comments do |t|
t.timestamps
t.belongs_to :user
t.belongs_to :content, polymorphic: true
t.text :body
t.boolean :accepted
end
end
end
We migrate the database, and Rails spits out the following SQL schema:
mysql> describe users;
+----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| username | varchar(255) | YES | | NULL | |
+----------+--------------+------+-----+---------+----------------+
mysql> describe blogs;
+---------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | YES | | NULL | |
| title | varchar(255) | YES | | NULL | |
+---------+--------------+------+-----+---------+----------------+
mysql> describe posts;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| created_at | datetime | YES | | NULL | |
| updated_at | datetime | YES | | NULL | |
| blog_id | int(11) | YES | | NULL | |
| content_id | int(11) | YES | | NULL | |
| content_type | varchar(255) | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
mysql> describe texts;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(255) | YES | | NULL | |
| body | text | YES | | NULL | |
+-------+--------------+------+-----+---------+----------------+
mysql> describe quotes;
+--------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| body | text | YES | | NULL | |
| source | varchar(255) | YES | | NULL | |
| url | varchar(255) | YES | | NULL | |
+--------+--------------+------+-----+---------+----------------+
mysql> describe links;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(255) | YES | | NULL | |
| url | varchar(255) | YES | | NULL | |
+-------+--------------+------+-----+---------+----------------+
mysql> describe comments;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| created_at | datetime | YES | | NULL | |
| updated_at | datetime | YES | | NULL | |
| user_id | int(11) | YES | | NULL | |
| content_id | int(11) | YES | | NULL | |
| content_type | varchar(255) | YES | | NULL | |
| body | text | YES | | NULL | |
| accepted | tinyint(1) | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
Like a good little Rails developer we go and create model classes for all these tables:
class User < ActiveRecord::Base
has_many :blogs
has_many :comments
end
class Blog < ActiveRecord::Base
belongs_to :user
has_many :posts
end
class Post < ActiveRecord::Base
belongs_to :blog
belongs_to :content, polymorphic: true
end
class Text < ActiveRecord::Base
has_many :posts, as: :content
has_many :comments, as: :content
end
class Quote < ActiveRecord::Base
has_many :posts, as: :content
has_many :comments, as: :content
end
class Link < ActiveRecord::Base
has_many :posts, as: :content
has_many :comments, as: :content
end
class Comment < ActiveRecord::Base
belongs_to :user
belongs_to :content, polymorphic: true
end
Now, let's create a couple of users:
alice = User.create(username: 'alice')
bob = User.create(username: 'bob')
Alice has a blog where she writes about MySQL optimisation. Here's a few posts she's added recently:
alice = User.where(username: 'alice').first
blog = alice.blogs.create(title: 'Not even once')
text_entry = Text.create(
title: 'How to bulk update a MySQL table',
body: 'First, delete all your indexes.'
)
quote_entry = Quote.create(
body: 'There are two hard things in computer science: MySQL consistency.',
source: 'CSS Perverts',
url: 'https://medium.com/could-you-not'
)
link_entry = Link.create(
title: 'Web development that does not always hurt',
url: 'http://rubyonrails.org/'
)
[text_entry, quote_entry, link_entry].each do |entry|
Post.create(blog: blog, content: entry)
end
One day, along comes Bob and leaves some helpful comments on Alice's blog:
bob = User.where(username: 'bob').first
text_entry = Text.first
quote_entry = Quote.first
Comment.create(
user: bob,
content: text_entry,
body: 'This sounds like a lot of work.'
)
Comment.create(
user: bob,
content: text_entry,
body: '4 hours later and I am still at my computer. Help!'
)
Comment.create(
user: bob,
content: quote_entry,
body: 'Never were truer words spoken. Such thought leadership.'
)
So, here's the problem. When Alice logs into Tumblr, she wants to see how many
new comments there are, because she needs to manually moderate and accept each
one. So, we want the count of all comments that relate to content that's posted
on one of Alice's blogs. We begin by adding a scope to Comment
so we can count
unmoderated comments:
class Comment < ActiveRecord::Base
belongs_to :user
belongs_to :content, polymorphic: true
scope :for_moderation, -> { where(accepted: nil) }
end
A naive way to count unmoderated comments would be to iterate over Alice's blogs, and for each post in each blog, count the number of unmoderated comments each post's content has. This loads a lot of unnecessary objects and runs a lot of queries.
>> alice = User.where(username: 'alice').first
=> #<User id: 1, username: "alice">
>> alice.blogs.inject(0) { |s,b| s + b.posts.inject(0) { |s,p| s + p.content.comments.for_moderation.count } }
Blog Load (0.4ms) SELECT `blogs`.* FROM `blogs` WHERE `blogs`.`user_id` = 1
Post Load (0.3ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`blog_id` = 1
Text Load (0.4ms) SELECT `texts`.* FROM `texts` WHERE `texts`.`id` = 1 ORDER BY `texts`.`id` ASC LIMIT 1
(0.3ms) SELECT COUNT(*) FROM `comments` WHERE `comments`.`content_id` = 1 AND `comments`.`content_type` = 'Text' AND `comments`.`accepted` IS NULL
Quote Load (0.1ms) SELECT `quotes`.* FROM `quotes` WHERE `quotes`.`id` = 1 ORDER BY `quotes`.`id` ASC LIMIT 1
(0.6ms) SELECT COUNT(*) FROM `comments` WHERE `comments`.`content_id` = 1 AND `comments`.`content_type` = 'Quote' AND `comments`.`accepted` IS NULL
Link Load (0.2ms) SELECT `links`.* FROM `links` WHERE `links`.`id` = 1 ORDER BY `links`.`id` ASC LIMIT 1
(0.6ms) SELECT COUNT(*) FROM `comments` WHERE `comments`.`content_id` = 1 AND `comments`.`content_type` = 'Link' AND `comments`.`accepted` IS NULL
=> 3
We'd like to do this in one query. Initially this seems hard because we can't
join across the polymorphic set of content tables to walk from posts to
comments. But, we don't actually need to do that. We can find all the comments
for a Post
by finding the comments with the same content_{id,type}
as the
Post
. In SQL, we can write:
mysql> SELECT COUNT(*)
-> FROM comments AS c
-> INNER JOIN posts AS p ON p.content_id = c.content_id AND p.content_type = c.content_type
-> INNER JOIN blogs AS b on b.id = p.blog_id
-> WHERE b.user_id = 1 AND
-> c.accepted IS NULL;
+----------+
| COUNT(*) |
+----------+
| 3 |
+----------+
Same answer, one query, and we didn't touch the content tables at all. So, if we can tell Rails how to relate comments directly to posts, then we can use relations and scopes to efficiently compose the above query. (We will actually end up with a slightly different, but equivalent query.)
We ultimately want to be able to chain some scopes on Comment
and count the
result:
Comment.for_blog_author(alice).for_moderation.count
Let's start by creating the for_blog_author
scope. We ultimately want to join
to the blogs
table and filter by blogs.author_id
. In Rails we can express
that like this:
class Comment < ActiveRecord::Base
belongs_to :user
belongs_to :content, polymorphic: true
scope :for_moderation, -> { where(accepted: nil) }
scope :for_blog_author, -> (author) {
joins(:blog).where(blogs: {user_id: author.id})
}
end
This new scope demands that comments have an associated blog
. We know
logically that a Comment
belongs to a Post
, which in turn belongs to a
Blog
. We can use has_one :blog, through: :post
for the latter, but the
former is more complicated. Recall the structure of the comments
and posts
tables:
mysql> describe posts;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| created_at | datetime | YES | | NULL | |
| updated_at | datetime | YES | | NULL | |
| blog_id | int(11) | YES | | NULL | |
| content_id | int(11) | YES | | NULL | |
| content_type | varchar(255) | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
mysql> describe comments;
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| created_at | datetime | YES | | NULL | |
| updated_at | datetime | YES | | NULL | |
| user_id | int(11) | YES | | NULL | |
| content_id | int(11) | YES | | NULL | |
| content_type | varchar(255) | YES | | NULL | |
| body | text | YES | | NULL | |
| accepted | tinyint(1) | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
So rather than a condition like posts.id = comments.post_id
to link posts and
comments, we have posts.content_id = comments.content_id AND posts.content_type = comments.content_type
. My first (wrong) guess at how to express that relation
looked like this:
class Comment < ActiveRecord::Base
belongs_to :user
belongs_to :content, polymorphic: true
belongs_to :post, -> (comment) {
where(content_id: comment.content_id, content_type: comment.content_type)
}
has_one :blog, through: :post
scope :for_moderation, -> { where(accepted: nil) }
scope :for_blog_author, -> (author) {
joins(:blog).where(blogs: {user_id: author.id})
}
end
This implementation leaves Rails still looking at posts.id
, since we've not
told it what the 'primary key' for this relation is.
>> comment = Comment.first
Comment Load (0.4ms) SELECT `comments`.* FROM `comments` ORDER BY `comments`.`id` ASC LIMIT 1
=> #<Comment id: 1, created_at: "2014-05-15 21:43:08", updated_at: "2014-05-15 21:43:08", user_id: 2, content_id: 1, content_type: "Text", body: "This sounds like a lot of work.", accepted: nil>
>> comment.blog
Blog Load (0.4ms) SELECT `blogs`.* FROM `blogs`
INNER JOIN `posts` ON `blogs`.`id` = `posts`.`blog_id`
WHERE `posts`.`id` = NULL AND
`posts`.`content_id` = 1 AND
`posts`.`content_type` = 'Text'
ORDER BY `blogs`.`id` ASC
LIMIT 1
=> nil
We need to make Rails think the 'primary key' to use on the posts
table is
content_id
and content_type
. Note: it's not the primary key of the posts
table in general, it's only the 'primary key' in the terminology Rails has to
express this particular join operation. This implementation gets us the query we
want:
class Comment < ActiveRecord::Base
belongs_to :user
belongs_to :content, polymorphic: true
belongs_to :post, -> (comment) { where(content_type: comment.content_type) },
foreign_key: :content_id,
primary_key: :content_id
has_one :blog, through: :post
scope :for_moderation, -> { where(accepted: nil) }
scope :for_blog_author, -> (author) {
joins(:blog).where(blogs: {user_id: author.id})
}
end
Here it is in action:
>> comment = Comment.first
Comment Load (0.4ms) SELECT `comments`.* FROM `comments` ORDER BY `comments`.`id` ASC LIMIT 1
=> #<Comment id: 1, created_at: "2014-05-15 21:43:08", updated_at: "2014-05-15 21:43:08", user_id: 2, content_id: 1, content_type: "Text", body: "This sounds like a lot of work.", accepted: nil>
>> comment.blog
Blog Load (0.9ms) SELECT `blogs`.* FROM `blogs`
INNER JOIN `posts` ON `blogs`.`id` = `posts`.`blog_id`
WHERE `posts`.`content_id` = 1 AND
`posts`.`content_type` = 'Text'
ORDER BY `blogs`.`id` ASC
LIMIT 1
=> #<Blog id: 1, user_id: 1, title: "Not even once">
However, this snippet still doesn't work:
>> Comment.for_blog_author(alice).for_moderation.count
app/models/comment.rb:5:in `block in <class:Comment>': undefined method `content_type' for #<ActiveRecord::Associations::JoinDependency::JoinAssociation:0x007f0ab5085f40> (NoMethodError)
This is because, when you access the post
association within scope/join logic,
the argument to the belongs_to :post
lambda is no longer a Comment
. To make
the lambda work in both cases, I did this:
class Comment < ActiveRecord::Base
belongs_to :user
belongs_to :content, polymorphic: true
belongs_to :post, -> (object) {
if object.is_a? Comment
where(content_type: object.content_type)
else
where('posts.content_type = comments.content_type')
end
}, foreign_key: :content_id, primary_key: :content_id
has_one :blog, through: :post
scope :for_moderation, -> { where(accepted: nil) }
scope :for_blog_author, -> (author) {
joins(:blog).where(blogs: {user_id: author.id})
}
end
This ends up running the query I want to count the unmoderated comments on Alice's blogs:
>> Comment.for_blog_author(alice).for_moderation.count
(0.2ms) SELECT COUNT(*)
FROM `comments`
INNER JOIN `posts` ON `posts`.`content_id` = `comments`.`content_id` AND
(posts.content_type = comments.content_type)
INNER JOIN `blogs` ON `blogs`.`id` = `posts`.`blog_id`
WHERE `blogs`.`user_id` = 1 AND
`comments`.`accepted` IS NULL
=> 3
Now, although it gets the right answer, and does so efficiently, that
belongs_to
code is incredibly ugly. Surely there is a better way to express
this?
I think there's a bug here, too (I might be over-interpreting the toy example, though; ignore me if so). Edit: Murray pipped me to it. :-)
Because
Content
has_many :posts
, but onComment
you're saying itbelongs_to :post
, the#post
and#blog
associations are likely to return weird (and inconsistent) results as soon as reblogging is allowed. The#post
association will most likely return the firstPost
record relating to the content, i.e. the original author's post. Whereas#blog
looks like this:Instead of the blog of the original poster, this would return the blog with the lowest id of anyone who has blogged or reblogged the content. And then in turn this:
Will mean that comments on a given piece of content will turn up in the moderation queue for everyone who has blogged or reblogged that piece of content.