Rails is smart - ActiveRecord::Relation in where clause gets converted to a subquery

kinopyo avatar

kinopyo

When passing an ActiveRecord::Relation to a where clause, Rails converts it to a subquery.
When select is omitted, Rails will use its primary key for the subquery.

Post.where(user_id: User.active.select(:id))
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN 
#   (SELECT "users"."id" FROM "users" WHERE "users"."deleted_at" IS NULL)

# It generates the same query even when select(:id) is omitted
Post.where(user_id: User.active)
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN 
#   (SELECT "users"."id" FROM "users" WHERE "users"."deleted_at" IS NULL)

Wonder how this is handled in Rails? Read on!

Rails/PluckInWhere cop

It all started from a RuboCop violation our team got the other day. Rails/PluckInWhere cop suggested us use select over pluck:

# bad
Post.where(user_id: User.active.pluck(:id))

# good
Post.where(user_id: User.active.select(:id))

It feels strange at first. I thought the original pluck is more efficient as it skips instantiating any User records and just returns all the user ids. Turns out, RuboCop was correct.

Post.where(user_id: User.active.select(:id))
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN 
#   (SELECT "users"."id" FROM "users" WHERE "users"."deleted_at" IS NULL)

Post.where(user_id: User.active.pluck(:id))
# SELECT "users"."id" FROM "users" WHERE "users"."deleted_at" IS NULL
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (...)

Note that by using select(:id), Rails is smart enough to convert it to a subquery and effectively combine the two queries into one.

Great performance improvement! 👏

Dive into ActiveRecord

I was wondering when this feature was added to Rails and how would I miss it. Searching "rails activerecord select subquery" didn't give me anything I wanted, so I dived into the Rails code after work.

When we pass an ActiveRecord Relation to a where clause, it lands at ActiveRecord::PredicateBuilder::RelationHandler.

Post.where(user_id: User.active.select(:id))
module ActiveRecord
  class PredicateBuilder
    class RelationHandler
      def call(attribute, value)
        if value.eager_loading?
          value = value.send(:apply_join_dependency)
        end

        if value.select_values.empty?
          value = value.select(value.table[value.klass.primary_key])
        end

        attribute.in(value.arel)
      end
    end
  end
end
  • attribute is an object of Arel::Attributes::Attribute, representing the left side of the where clause. attribute.name is "user_id".
  • value is an object of User::ActiveRecord_Relation representing the right side of the where clause. value.arel returns an object of Arel::SelectManager
  • attribute.in method is defined at Arel::Predications#in.
  • attribute.in(value.arel) taps into the Arel world and generates the final SQL query with subquery.

If we look at the 3 lines above, it's setting the select values if it's empty. So I did further experiment - removing the select(:id) from the User where clause:

Post.where(user_id: User.active)
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN 
#   (SELECT "users"."id" FROM "users" WHERE "users"."deleted_at" IS NULL)

It generates the same query! 🎉 We can omit select if we were to pass the PK of the model.

Hmm, does it mean Rails/PluckInWhere cop can be improved too? 🤔

Anyway, here is the full backtrace (for Rails 7.0.2.3) in case you want to check out other parts of the code:

active_record/relation/predicate_builder/relation_handler.rb:8:in `call'",
active_record/relation/predicate_builder.rb:63:in `build'",
active_record/relation/predicate_builder.rb:54:in `[]'",
active_record/relation/predicate_builder.rb:126:in `block in expand_from_hash'",
active_record/relation/predicate_builder.rb:79:in `each'",
active_record/relation/predicate_builder.rb:79:in `flat_map'",
active_record/relation/predicate_builder.rb:79:in `expand_from_hash'",
active_record/relation/predicate_builder.rb:25:in `build_from_hash'",
active_record/relation/query_methods.rb:1292:in `build_where_clause'",
active_record/relation/query_methods.rb:735:in `where!'",
active_record/relation/query_methods.rb:730:in `where'",

So, when was it added to Rails?

It wasn't easy but I dug out the original commit: https://github.com/rails/rails/commit/9b188c5bfe04349aa8ee0eeb2b53456601b8c3fc

removed an unnecessary second query when passing an ActiveRecord::Relation to a where clause. And added ability to use subselects in where clauses.

It's been there since Rails 3.1.12 😅 I was a junior Rails developer at the time...


Afterthought: It was the RuboCop cop that brought me into this research. After publishing the post, I realized in reality I would have written the Rails code differently using joins. That's probably why I haven't needed to rely on the subquery in most cases.

Post.joins(:user).merge(User.active)
# or
Post.joins(:user).where(users: { deleted_at: nil }).to_sql

# SELECT "posts".* FROM "posts" 
# INNER JOIN "users" ON "users"."id" = "posts"."user_id" 
# WHERE "users"."deleted_at" IS NULL

Hope you find this post helpful 🙂

kinopyo avatar
Written By

kinopyo

Indoor enthusiast, web developer, and former hardcore RTS gamer. #parenting
Enjoyed the post?

Clap to support the author, help others find it, and make your opinion count.