Watch out for Mongo DB slow queries

I’m adding a feature to a production application that I’m using MongoDB to implement. I’ve rolled this feature out to our production servers in a mirror mode so I can see it perform, but no production users are effected by this new code.

The first thing I notice is that my code to update a specific record is running in 400 to 4000 miliseconds! I need this to be running in 10-100ms for this to be a useful feature. What’s going on here?

I check the production MongoDB, and the mongo server is running at 95-100% CPU and and is outputting a bunch of slow lookups to its log.

I add some more processing units to add more capacity, but this doesn’t work at all! In fact, everything is now taking 1000-8000 miliseconds to process. It looks like MongoDB is single threaded on a simple query. Since everyone is raving about how great MongoDB is, how is it my code is making it look really, really bad?

My app is a Rails application using MongoMapper, and a common pattern I use in my Rails apps is to use this code to get a single record:

  def self.load(id)
    Model.find(:first, :conditions => { :_id => id })
  end

This code returns a nil if the record isn’t found. This version will throw an exception if the record isn’t found.

  def self.load(id)
    Model.find(id)
  end

In MySQL, the difference in query speed is essentially zero!

In MongoDB – not so much.

The difference in performance on my data set between the two queries is staggering:

The find(:first) method takes about 200ms, locks the collection during this time, and sucks CPU. The find(id) method takes about 1ms and, even if it locks the collection, returns fast enough to not matter in my case. CPU usage is now reasonable again.

A quick update to my code, and now my operations are taking 6-400ms (there’s some other work in there too that causes the variation in timing)

Posted by Tom Sun, 27 Sep 2009 05:18:00 GMT


My new iPhone's here, but its still a brick

Received our new iPhone 3G[S] via UPS today. They arrived before 10am on the day of availability!

We hook them up to iTunes to activate, run through the entire process, and get to the point where it says ‘Activation may take 48 hours’

WTF?

We’re now 12 hours in, and our phones are still just expensive iPod Touch devices!

Maybe at&t will have them activated tomorrow. I’m guessing not though.

Posted by Tom Sat, 20 Jun 2009 05:41:00 GMT


nginx 400 Bad Request errors due to cookies and what to do about them

Recently, we’ve had reports on our apps that users are getting 400 Bad Request errors from our servers. On checking out our log files, there was nothing to see! All is working well. Yet the reports continue. What’s going on?

We’re using nginx 0,6,34, and the reports the users are seeing mention this. These errors are definitely coming from our servers, but still nothing in the error logs for the server.

After a lot of head scratching and diagnosing, it seems that we have a number of large cookies being set, and the users who are getting these large cookies are the ones having problems. Lets check the interwebs for some help here. Here’s a relevant post on the Ruby Forum

It seems that we need to increase the size of the header space that nginx needs to process headers. The setting for this is large_client_header_buffers, and the size part of this needs to be big enough to handle the entire cookie header being sent.

We set this to 8K and tried again. Still getting errors!

Putting a little test app together to play with cookie sizes, I quickly find the documentation for large_client_header_buffers and see that it works at the http or the server level. Well, it doesn’t work at the server level at all! That’s where we had it. The same setting at the http level works great though!

Now we have a fix!

Put the setting for large_client_header_buffers at the http level and set them big enough to handle large cookies! How big can that be? HTTP defined a max number and size of cookies, right? Lets go see what that is!

Oh dear. Modern browsers have given up on these size restrictions! You can now expect to be able to handle 40 cookies of 4K max each, or Safari could decide to send you 100 of those. On every request. Nice job browser makers! Just what I need is a 50K request to get my 1K icons or 10K images from my site.

So, watch out for large cookies and configure your nginx correctly before your users get those Bad Requests!

Posted by Tom Thu, 18 Jun 2009 05:22:00 GMT


My new Dell M6400 Covet laptop rocks

I just got a new laptop to replace my 3 year old E1505. Its orange.

I’ve been looking at getting a Dell M6400 Mobile Workstation, and had got the configuration setup just the way I wanted: 8GB Ram, fast dual core CPU, SSD, 17” 1920×1200. I liked the color of the Covet version, but Dell is offering 10% off the regular one, and the Covet configured the way I wanted was $1,100 more at $4,600!

When poking around looking for discounts, I saw a link to the Dell Outlet store, and here’s the page I found for the M6400 Outlet – when I first visited, there were 3 pages, now I’m typing this, there are 8 computers total! The availability changes fast, and what you see is what’s available.

I saw a configuration that was close enough to what I wanted to go for it, the only major missing piece was the SSD drive. Total price: $2,700 for 8GB, 160GB disk, 17” RGB 1920×1200, 3.06GHz dual core. With the addition of an 80GB Intel x25 SSD from NewEgg for $315, this is my $3,500 for just over $3,000. Just add credit card!

When the box arrived, and I opened it up, inside was a Covet version!

So now I have a computer that Dell is selling for $4,600 for a total cost of $3,000.

This is a refurbished computer, but it has a full 3 year warranty.

What is wrong with it for someone to have sent it back? When I received it, the OS was Vista 32 bit. This OS was reporting 3.5GB RAM. In the box was a recovery disk for Vista 64bit. I wonder if someone received it, saw that there was less than 4GB RAM when 8GB was ordered and then returned it.

As a benchmark, this laptop runs my apps test suite in 1/3 the time of my Dell E1505, and the 17” screen is much easier to work with many windows.

Posted by Tom Thu, 04 Jun 2009 06:11:00 GMT


Offbeat Creations company web site now live!

We’ve been building Offbeat Creations for over a year now, and we just managed to find enough time to actually put together a web site for the company!

We’re using the Radiant CMS to run this, to allow us to change the content quickly and easily. I’m still not very comfortable with the way Radiant works, but it does get the job done.

Check us out at Offbeat Creations

(The Offbeat Creations logo was designed by my friend John Betts at Glass Apple)

Posted by Tom Sat, 16 May 2009 18:38:00 GMT


Cool Birthday Gifts from my wife

Lisa got me a few cool t-shirts for my birthday. I think my favorite is the Rock-Paper-Scissors-Lizard-Spock shirt pictured left. This shirt can be got from ThinkGeek , so you too can have the best geek shirt in town!

Posted by Tom Thu, 14 May 2009 19:29:00 GMT


Rails Time Zones and Custom database queries

I’ve been using Rails for so long without caring about how Time Zone’s are processed, that when I recently built a new application with Rails 2.3, I decided that I needed to care about the support for Time Zones since Rails 2.1

Technically, my new app doesn’t really care about time zones, so I could have just left the whole thing alone, but I wanted to see how these worked and get some experience.

The first big surprise is that dates are stored in the database as UTC time. Well, duh! This is usually seamless, but I noticed this detail when my custom code to group by created date on a set of objects suddenly started showing items for tomorrow at about 5pm Pacific time.

Here’s the code I was using:

      Items.find(:all,
                 :select => "DATE(created_at) as created_on, count(*) as c",
                 :group => "DATE(created_at)",
                 :conditions => [ "created_at > ?", 14.days.ago.to_date ],
                 :order => 'created_on desc')

I realized that I needed to adjust for the current Time Zone when retrieving this data.

With some investigation – Google search wasn’t entirely helpful here – I ended up with the following code that retrieves the Time Zone offset in seconds to add to the stored time (for Pacific Time, this is a negative number!)

      zone = 'PDT'
      Items.find(:all,
                 :select => "DATE(date_add(created_at, interval #{Time.zone_offset(zone)} second)) as created_on, count(*) as c",
                 :group => "DATE(date_add(created_at, interval #{Time.zone_offset(zone)} second))",
                 :conditions => [ "created_at > ?", 14.days.ago.to_date ],
                 :order => 'created_on desc')

Now my reported counts by day rollover to the next day at midnight Pacific Time. Adjust your zone setting to suit what results you need.

Posted by Tom Mon, 11 May 2009 06:20:00 GMT


Accessing your database from Rails Metal

I was interested in using the new Rails Metal system for a project I’ve been building. What I want this code to do is take parameters out of the URL and write a record to my database to record these parameters.

This is my first application built with Rails 2.3, and the first application in my company to use Rails Metal in a production environment.

Here’s what I wrote:

  def self.call(env)
      if env["PATH_INFO"] =~ /^\/i\/(\d+)\/(\d+)\/(\d+)\/(\d+)/
        Record.track($1.to_i, $2.to_i, $3.to_i, $4.to_i)
        [ 200, { "Content-Type" => "text/html" }, [ "OK" ] ]
      else
        [ 404, { "Content-Type" => "text/html" }, [ "Not Found" ] ]
      end
  end

This code works great. Runs fast, does what I need.

Until I run it in production under some load!

Suddenly, I see requests backing up and monit is killing my Mongrel’s as the heartbeat call is timing out!

Diagnosing this problem was a bit painful. The RAILS_DEFAULT_LOGGER isn’t available, so I have to create my own logger, and then I used Benchmark.measure to writing out some timings.

The one thing I see is that every 5 calls, a call stalls for exactly 5 seconds. My superfast recording system is taking on average 1 second per request!

The number 5 rings a bell – where did I see that recent;y? I remember now, its in the database.yml file. There’s a new setting for connection pooling. I wonder how that works?

Well, it turns out that Rails Metal really doesn’t set up anything for you, nor does it tear anything down at the end of the request. For Connection Pooling the connection is setup automatically when you access your database, but outside of a Rails Controller, you’d better release this connection before you’re done.

Here’s a simplified version of how I did this:

  def self.call(env)
    if env["PATH_INFO"] =~ /^\/i\/(\d+)\/(\d+)\/(\d+)\/(\d+)/
      begin
        Record.track($1.to_i, $2.to_i, $3.to_i, $4.to_i)
      ensure
        # Release the connections back to the pool
        ActiveRecord::Base.clear_active_connections!
      end
      [ 200, { "Content-Type" => "text/html" }, [ "OK" ] ]
    else
      [ 404, { "Content-Type" => "text/html" }, [ "Not Found" ] ]
    end
  end

Posted by Tom Fri, 08 May 2009 03:34:00 GMT


Performance with MySQL and paginating large tables

If you’re using the will_paginate Rails plugin to manage the amount of data that you show to the user on a single page, there is a gotcha that comes into play as your data gets bigger.

The default display of the pagination buttons for shows links that can take you to the last page of the set. If you only have a few thousand records, you’ll never notice the problem, but try it with a million records and check out the MySQL slow log file.

The SQL query generated by your paginate call will look something like:

  SELECT * from foos ORDER BY created_at LIMIT 1000000,50

In the mysql-slow.log file you’ll eventually start seeing an entry that looks like this:

  # Query_time: 6  Lock_time: 0  Rows_sent: 50  Rows_examined: 100050
  SELECT * from foos ORDER BY created_at LIMIT 1000000,50

The key here is that MySQL had to look at 1M rows to return the 50 you requested!

This is a Very Bad Thing.

You must do all you can to avoid this case. What you do will depend on your specific application. The easiest is to not let page numbers go above a certain maximum, and configure the will_paginate view helper to not show all the possible page numbers.

Posted by Tom Fri, 27 Mar 2009 06:17:00 GMT


Rails partials and named_scope gotcha

I ran into a problem today where I had SQL queries taking a really long time to execute and I had pages timing out. After a lot of investigation, I found a strange behavior that I didn’t expect that was killing my application.

I’m using Rails 2.1.1 and I have a named scope that looks like this:

@data = Foo.something.another_thing.limited(10)

The limited scope adds a “limit X” to the query to reduce the number of rows returned

In my view, I have some code that looks like this:

<%= render :partial => 'foo', :collection => @data %>

This code works great.

Except when the Foo table starts to get large.

In the MySQL slow log, I was seeing almost this generated query, except it looked like the code was running it as a count(). Since my table is quite large (hence only getting the first 10 items!), this count was taking a really long time.

Checking through the code, I find that the partial rendering code calls empty? on the collection. For Named Scopes, this partial optimization actually causes the Named Scope implementation to call count if it doesn’t yet have data, which causes a huge database hit in my case.

So how do I fix this?

I think the best way to fix this is to use the all method on the named scope object to cause the data to be loaded by the time the partial rendering checks for for empty?

<%= render :partial => 'foo', :collection => @data.all %>

Its interesting that for a count() query with a limit X that MySQL doesn’t stop when it gets to X, but ultimately, its the Rails code performing an optimization that causes a database access that is to blame here.

Posted by Tom Thu, 22 Jan 2009 07:19:00 GMT


Older posts: 1 2 3 ... 20