Monday, November 8, 2010

Cache busting in Django

One of the really annoying things in web development is dealing with caching. Browsers incorporate an elaborate caching scheme to speed up common browsing operations, which is a really great idea. But a side-effect is that when you release a change on a site, you risk that some people get the new HTML while their browser is still using an old CSS or Javascript file because their cache hasn't timed out yet.

It has bitten me several times during development too, some browsers have really aggressive caching schemes. Debugging problems caused by a stale CSS/JS file is just not fun.

One way to fix it is to use a framework to run the CSS and Javascript through a combiner/minifier that at the same time outputs a versioned filename, e.g. by hashing the contents. There are even a couple of Django projects going down this route to make it happen automatically.

I haven't found one I liked yet, though. You have to modify views or templates and fit your paths into the system, it doesn't help with changing images, if you use third-party code you have to modify that too, and the minification step makes it pretty hard to debug Javascript problems; even if you isolate it to live, problems happen on live too, sometimes.

Really, if we take a step back from the minify dream, the simple solution here is to append some kind of version number to the file. Then new HTML should have filenames with new version numbers so clients can't used their old cached files because the paths don't match.

I've found a really simple way to automate this transparently for a project by modifying one line in settings.py. The idea is to hook into Django with a middleware class, search the generated HTML for local URLs and replace them with URLs with version numbers. This way, we can catch all CSS and Javascript references as well as images no matter where they occur.

Since common file systems don't provide version numbers on the files they store, we'll use the modification timestamp instead. And instead of renaming files to contain the version number, we'll just append a ?_=timestamp to the URL:
  <img src="/media/images/logo.png">
becomes
  <img src="/media/images/logo.png?_=4321234">
Everything after the ? is ignored when looking up in the file system by the static file web servers I've tested so far. So it just works.

As a bonus, when this is running, you can safely modify the web server serving the static files to set an expiration date far out in the future for files requested with ?_=xxxxx. This allows an aggressive, yet perfectly safe caching scheme to be employed by any upstream cache, including the browser.

The above trick can be done in about 50 lines of Python. You can grab the little self-contained middleware module modtimeurls.py from my brand-new Garden of Python. It has been in production on several sites for over a year. The only problem I've found so far was with a PNG transparency fixer for IE6 that assumed .png files would end with .png rather than .png?_=xxxxxx.

No comments:

Post a Comment