When you build a web application intended for a large scale, there are two separate performance issues when it comes to generating the HTML the server spits out.
One is server load. How many visitors can the server cope with?
The other thing is response time as seen by each visitor. How long must I wait for the page to load?
When using a relatively slow dynamic web framework (as Python + Django), the issue with server load is mostly related to CPU time. If you read the Django docs, the recommended solution is caching, i.e. instead of generating the HTML on each hit, we generate it once, store it for a while, handing out the stored snippet until it times out. It's a simple idea that works incredibly well.
First question is where we do the caching. To reduce the workload as much as possible, we want it to happen as close to the browser as possible. For a low-cost solution that works for everyone, we need to stick to the same physical location as the web server, however.
With Django on Apache, you've got some relatively fast C code serializing the requests to separate Python processes running the Django code. The process model limits the concurrency level possible because each process gobs a fair amount of memory, and Python is just generally an order of magnitude slower than C. So while putting the cache in the web framework is arguably very convenient, it's also pretty slow. From my experiments, one or two orders of magnitude.
Ideally, we'd just run a simple process written with speed in mind in front of Apache to do the caching. Afterall, the idea is relatively straightforward. Enter Varnish, a web cache written in C.
With Varnish in place, you know that the server only has to generate a page once per timeout. Some of the pages on YayArt take about 1 second to generate because they need to do multiple complex database queries. Let's say we have a beefy server that can do 10 of these in parallel. Then we could support 10 requests/s. With caching in Varnish, we can scale up to whatever it takes Varnish to retrieve a string from its cache and serve it. You can easily reach 6000 requests/s. For comparison, 100 requests/s is 8.6 million hits/day.
However, as it turns out, while simple whole page caching can solve a large part of the server load problem with one blow without giving up the convenience of a dynamic web framework, it's not necessarily the solution to the response time. If the page is in the cache, the response time is close to perfect. But what if it's a slow day with few visitors and it's not?
The general solution is to precompute the answer. Unfortunately, how to do this is more application specific.