I’ve been noticing 404 errors on my blog for pages that should be there. After digging around, I noticed that the page cache is getting in the way of delivering pages.
Here’s what I think is going on.
I have a page with a URL http://blog.craz8.com/articles/2005/10/26/typo-has-caching-issues.
I also have a page with a URL of http://blog.craz8.com/articles/2005/10/26 and one with a URL of http://blog.craz8.com/articles/2005/10.
If the page with the shorter URL is requested before the one with the longer URL, then the longer URL will return a 404. If the longer URL is requested first, then nothing bad happens.
Why is this?
First, here’s how Rails page cache works.
- URL is requested from Rails, output is generated
- An After filter takes that URL and the generated content and writes out a file in the cache directory called /a/b/c.html for the URL /a/b/c – creating the directories as needed.
- A RewriteRule in the .htaccess file tells Apache to serve up the /a/b/c.html file for URLs of the form /a/b/c
- Apache serves up the static HTML file if it exists, otherwise it calls into Rails.
What seems to be happening is, if /a/b.html exists, then a request for /a/b/c will be converted into a request for /a/b.html/c unless a directory called ‘b’ exists in ‘a’. So, the actual URL used by Apache depends on whether a subdirectory exists or not. If the shorter URL is requested from Rails before the longer URL, then the subdirectory will never be created and requests for the longer URL will always fail.
The order of hits to your site determines whether your users will see 404 errors for some pages
There maybe something that can be done in the Apache configuration to avoid this problem, but for now, I’ve disabled the RewriteRule entry in my .htaccess file that tries to perform this magic, and turned caching off in my Typo to avoid writing out cache files that won’t be used.