Performance and Ruby on Rails Models

The Active Record design pattern simplifies data access in Rails applications. But you can also shoot yourself in the foot performance-wise if you misapply it to queries that span multiple objects and instances. In this article, we look at ways you can improve query performance.

A Good Measure

So where does one start? As a rule of thumb, always optimize the slowest queries first. It sounds obvious but you’d be surprised. Use the application log to get a general sense of which queries are taking the longest. You can also apply the “benchmark” method—available in several key places:

Inside models:

def self.get_active
  self.benchmark 'Get Active Contacts' do 
    find_all_by_status('active')
  end
end

Inside controllers (acting upon a model):

Contact.benchmark 'Get Active Contacts' do
  @active_contacts = Contact.get_active
end

Inside views:

<% benchmark 'Show Active Contacts' do %>
  <%= @active_contacts %>
<% end %>

A cool feature is the ability to string multiple benchmarks together as follows:

Contact.benchmark 'First Way' do
  @active_contacts = Contact.get_active_test1
end
 
Contact.benchmark 'Second Way' do
  @active_contacts = Contact.get_active_test2
end
 
Contact.benchmark 'Third Way' do
  @active_contacts = Contact.get_active_test3
end

which then appears in your app log like this

First Way (0.65363)
Second Way (0.56675)
Third Way (0.54476)

Beyond these basics, you can explore the many profiling tools available to Ruby programmers.

Including Data Upfront

Often, you need to display a model instance and its related associations. Example: a company view also displays a list of related contacts. Active Record permits you to retrieve as much data as you need using the ‘include’ symbol; this is often referred to as ‘eager loading of associations.’

companies = Company.find(:all, 
                         :include => :contacts, 
                         :conditions => "contacts.status = 'Active'")

Behind the scenes, the SQL statement will include a join to the related table(s).

However, the previous example does not always result in better performance. While you can significantly cut down on the number of roundtrips to the database, the amount of data returned, to be processed by Ruby code and wrapped into an Active Record object, can in fact slow things down. This is especially true if your model contains a lot of attributes.

One way to speed things up is to only load the columns needed to support the view. Going back to the company example, we only care about showing contacts’ names and email addresses.

companies = Company.find(:all,
                         :include => :contacts, 
                         :select => "companies.*, contacts.first_name, 
                                     contacts.last_name, contacts.email" 
                         :conditions => "contacts.status = 'Active'")

A Little Bit of Ruby Love

Depending on the size of your database, sometimes it’s faster to retrieve ALL of the rows for a given model and then use Ruby’s sophisticated array handling capabilities to parse through the collection.

For example, let’s say we’re generating a report that counts the number of active contacts for all companies in the system. As the number of contacts grow, the following snippet will result in longer execution time. The ‘find_all_by_status’ method results in another database call for each company instance:

Company.find(:all).each do |company|
  puts company.contacts.find_all_by_status('active')
end

The following alternative gets us down to two database calls:

active_contacts = Contact.find_all_by_status('active')
Company.find(:all).each do |company|
  puts active_contacts.select{ |contact| contact.company_id == company.id }
end

What about the :include symbol? In some of our data access tests, the above variation proved to be faster than eagerly loading associations. Our guess is the construction of the larger SQL statement is a bit slower than two simple “select *” calls. Again, use the Benchmark functions to see for yourself. Only testing will help you come to your own conclusions.

Are You (De)Normal?

If you have a summary value that is expensive to calculate (i.e. requires multiple database roundtrips and/or a complex block of Ruby code) and changes frequently enough where caching won’t help much, then denormalizing the attribute may be useful.

Let’s say you are calculating a list of top scores of participants in a series of games. One way to do it is

Player.find(:all).each do |player|
  puts player.games.sum{ |game| game.score }
end

If you only have a handful of players and games in the system, you’re fine. But increase it 10-fold and your nice “top scores” view begins to drag since a database call is being made to get each game’s score. Wouldn’t it be great if we could consolidate things to one roundtrip, like this:

Player.find(:all, :order => 'total_score DESC').each do |player|
  puts player.total_score
end

Well, you can, with a little bit of refactoring and applying a Ruby on Rails observer object. Observers allow you to attach behaviors to specific model events, or callbacks. For example, you could write code that automatically logs an entry to a database table or sends out an email notification when a specific type of model is saved.

First, add the summary field – in this example, the ‘total_score’ – to the desired table using a Rails migration script.

Then, create an observer object that will “watch” Game instances. Any time a Game’s score is updated, we will summarize the player’s total score in their record.

Creating the observer is easy, just use the generate script:

script/generate observer game

Inside the new GameObserver class, add this:

class GameObserver < ActiveRecord::Observer
  def after_save(game)
    player = game.player
    player.update_attribute('total_score', player.total_score + game.score)
  end
end

But wait – you’re not done yet – you’ll also need to update your application’s environment.rb to activate the observer:

Rails::Initializer.run do |config|
  #other config settings go here
  config.active_record.observers = :game_observer
end

Then be sure you update any code still using the real-time calculated version of player’s scores and replace with player.total_score. If you’re trying to get the top 10 players, it’s as simple as

Player.find(:all, :order => 'total_score DESC', :limit => 10)

Rolling Up the Sleeves

In some cases, writing your own SQL can be the solution to squeezing out every millisecond of query performance. The trade-off, of course, is code maintenance. Every time you add a new field to your model, you’ll have to remember to update any hand-written queries. Everything you can do with SQL is abstracted in Active Record, so you don’t need to resort to it often. However, it can come in handy in reporting scenarios, where your query becomes complex, needing to display attributes from many objects, sometimes indirectly associated.

For example: display a list of emails for all active contacts associated with clients in the ‘West’ sales region, who have open orders, containing items of category X with a status of ‘Backordered.’

Contact.find_by_sql("select email from contacts co, clients cl, regions r, 
orders o, line_items li, products p, product_categories pc where 
co.client_id = cl.id and cl.region_id = r.id and o.client_id = cl.id 
and p.order_id = o.id and li.order_id = o.id and p.category_id = pc.id 
and r.region = 'West' and o.status = 'Open' and pc.name = 'X' and 
li.status = 'Backordered'")

Tuning the Backend

Adding a database index to a table, especially on columns that are frequently involved in joins, can boost performance as well. But be careful, as creating an index will improve read operations, but on the flip side, write operations will slow down. This article won’t go into the intricacies of database indexing, since this topic is already covered in detail elsewhere. However, Rails does makes things easy through migration scripts.

class AddGamesIndexes < ActiveRecord::Migration
  def self.up
    add_index :games, :player_id
  end
 
  def self.down
    remove_index :games, :player_id
  end
end

What Next

We discussed some techniques that help you measure and refine performance in Ruby on Rails models before needing to consider other options, such as caching.

We also recommend keeping your Rails framework current and peeking into Edge Rails every now and then. Over the past 6 months, for instance, there has been a lot of activity focused on Active Record performance optimizations. Rails 2.0 introduced query caching. This is an area which will continue to improve.


About this entry