Friday, March 7, 2008

Favourite Linux bug of all times

Here's the story about my favourite Linux bug: the thrashing hell syndrome.

To reproduce it you need to have swap activated and a run-away process. Traditionally, for the latter Netscape Navigator was great, in these days it might happen with Firefox with Flash or Monodevelop. Or if you're developing something and accidentally make an eternal loop that allocates memory.

What will happen is that the system quickly slows down. You'll have some seconds to react to kill the process. If you're too slow, the graphical interface locks up and you're toast while the machine enters thrashing hell. You can usually get to a terminal and even enter username and password, but it locks up again before the shell is running. Probably because you can't spawn a new process when in thrashing hell.

The condition will last for anywhere between a couple of minutes and more than half an hour. Unless you power off the machine, there's nothing you can do. At all. Other than to stare at a frozen screen while the machine entertains itself.

At some point an out-of-memory killer was introduced in an attempt, I believe, to mitigate this problem. Its job was to kill a process to free up memory in memory-tight situations. Here's a hilarious analogy by Andries Brouwer. I suspect Netscape was the main cause, although I might be wrong, I've never been close to kernel development anyway.

Anyway, that was long time ago.

Fast forward to the introduction of Ubuntu, the really easy-to-use GNU/Linux. I should warn you that I'm going to do some bashing of Ubuntu now, which is not really fair since Ubuntu is pretty cool.

When I first got into thrashing hell on a Ubuntu system, I didn't believe it had happened. I thought the OOM killer should have saved me, but apparently not. Probably it'll only kick in when all of the swap is used. But hey, Ubuntu is for the masses, right? The system should have protected me. That's what I naively thought.

So I filed a Ubuntu bug report on the kernel. I hoped that a couple of the knowledgeable people who are working on Ubuntu would fix this old problem in a snap.

Which didn't happen. The bug wasn't accepted. As far as I can tell because the guy in the other end thought the process limiting support in the kernel is advanced enough to take care of the problem. I wasn't sure whether he was right. But I filed a new bug report with the hope that someone else would see the vision.

That's over two years ago. So far no progress. Which is not what I'd hoped, because it really puts the system in a bad light. If this happened to an average user, that person would instantly conclude that Ubuntu/Linux just crashed.

But it's sort of fine, because it's open source etc. etc. and the people in the other end are free to spend their time as they like.

Then today there was an update. Cool! I thought. Until I found out that it's just a drone reply. Even a particularly annoying one, apparently it's not good enough to report a bug on the latest release since the problem might theoretically be fixed in the next alpha. The bug has already been auto-closed once because if noone's interested then the report must be invalid.

So noone wants to listen. That makes me think this will continue to be my favourite Linux bug for many more years to come.


PS: My favourite recent Windows bug is probably the one where Windows interprets all mouse clicks as right-button clicks.

No comments:

Post a Comment