Rails is smart - ActiveRecord::Relation in where clause gets converted to a subquery
When passing an ActiveRecord::Relation
to a where
clause, Rails converts it to a subquery.
When select
is omitted, Rails will use its primary key for the subquery.
Post.where(user_id: User.active.select(:id))
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN
# (SELECT "users"."id" FROM "users" WHERE "users"."deleted_at" IS NULL)
# It generates the same query even when select(:id) is omitted
Post.where(user_id: User.active)
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN
# (SELECT "users"."id" FROM "users" WHERE "users"."deleted_at" IS NULL)
Wonder how this is handled in Rails? Read on!
Rails/PluckInWhere cop
It all started from a RuboCop violation our team got the other day. Rails/PluckInWhere cop suggested us use select
over pluck
:
# bad
Post.where(user_id: User.active.pluck(:id))
# good
Post.where(user_id: User.active.select(:id))
It feels strange at first. I thought the original pluck
is more efficient as it skips instantiating any User records and just returns all the user ids. Turns out, RuboCop was correct.
Post.where(user_id: User.active.select(:id))
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN
# (SELECT "users"."id" FROM "users" WHERE "users"."deleted_at" IS NULL)
Post.where(user_id: User.active.pluck(:id))
# SELECT "users"."id" FROM "users" WHERE "users"."deleted_at" IS NULL
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (...)
Note that by using select(:id)
, Rails is smart enough to convert it to a subquery and effectively combine the two queries into one.
Great performance improvement! 👏
Dive into ActiveRecord
I was wondering when this feature was added to Rails and how would I miss it. Searching "rails activerecord select subquery" didn't give me anything I wanted, so I dived into the Rails code after work.
When we pass an ActiveRecord Relation to a where
clause, it lands at ActiveRecord::PredicateBuilder::RelationHandler.
Post.where(user_id: User.active.select(:id))
module ActiveRecord
class PredicateBuilder
class RelationHandler
def call(attribute, value)
if value.eager_loading?
value = value.send(:apply_join_dependency)
end
if value.select_values.empty?
value = value.select(value.table[value.klass.primary_key])
end
attribute.in(value.arel)
end
end
end
end
-
attribute
is an object ofArel::Attributes::Attribute
, representing the left side of the where clause.attribute.name
is"user_id"
. -
value
is an object ofUser::ActiveRecord_Relation
representing the right side of the where clause.value.arel
returns an object ofArel::SelectManager
-
attribute.in
method is defined atArel::Predications#in
. -
attribute.in(value.arel)
taps into the Arel world and generates the final SQL query with subquery.
If we look at the 3 lines above, it's setting the select values if it's empty. So I did further experiment - removing the select(:id)
from the User where clause:
Post.where(user_id: User.active)
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN
# (SELECT "users"."id" FROM "users" WHERE "users"."deleted_at" IS NULL)
It generates the same query! 🎉 We can omit select
if we were to pass the PK of the model.
Hmm, does it mean Rails/PluckInWhere cop can be improved too? 🤔
Anyway, here is the full backtrace (for Rails 7.0.2.3) in case you want to check out other parts of the code:
active_record/relation/predicate_builder/relation_handler.rb:8:in `call'",
active_record/relation/predicate_builder.rb:63:in `build'",
active_record/relation/predicate_builder.rb:54:in `[]'",
active_record/relation/predicate_builder.rb:126:in `block in expand_from_hash'",
active_record/relation/predicate_builder.rb:79:in `each'",
active_record/relation/predicate_builder.rb:79:in `flat_map'",
active_record/relation/predicate_builder.rb:79:in `expand_from_hash'",
active_record/relation/predicate_builder.rb:25:in `build_from_hash'",
active_record/relation/query_methods.rb:1292:in `build_where_clause'",
active_record/relation/query_methods.rb:735:in `where!'",
active_record/relation/query_methods.rb:730:in `where'",
So, when was it added to Rails?
It wasn't easy but I dug out the original commit: https://github.com/rails/rails/commit/9b188c5bfe04349aa8ee0eeb2b53456601b8c3fc
removed an unnecessary second query when passing an ActiveRecord::Relation to a where clause. And added ability to use subselects in where clauses.
It's been there since Rails 3.1.12 😅 I was a junior Rails developer at the time...
Afterthought: It was the RuboCop cop that brought me into this research. After publishing the post, I realized in reality I would have written the Rails code differently using joins. That's probably why I haven't needed to rely on the subquery in most cases.
Post.joins(:user).merge(User.active)
# or
Post.joins(:user).where(users: { deleted_at: nil }).to_sql
# SELECT "posts".* FROM "posts"
# INNER JOIN "users" ON "users"."id" = "posts"."user_id"
# WHERE "users"."deleted_at" IS NULL
Hope you find this post helpful 🙂
Clap to support the author, help others find it, and make your opinion count.