Friday, September 21, 2007

Object-oriented database systems

I had a discussion with Anders today about databases.

I'm not very fond of relational databases, or perhaps more precisely SQL. Having to convert back and forth between a database model and SQL and a programming language with all it entails of proper quoting to prevent SQL injection attacks and joining tables all over the place is really a great source of frustration. And a good waste of time.

On the other hand, I've seen what a record-based file system with no built-in join can do to people. And the built-in indexing stuff that lets you search on different attributes without a linear scan is just neat.

I think the great success of the relational database systems has a few simple explanations. Many business-related applications are really about the data. They spend their time collecting and displaying lots of simple, boring data - unlike an entertainment program or a standard desktop application that spends a lot of time dealing with small amounts of data, maybe less than say 1000 objects.

This data may be larger than what can be kept in memory (the possibility of this happening is enough to make it a problem). Furthermore the data must be kept when the machine is turned off. So you have to deal with secondary storage, a topic largely ignored by most programming languages that otherwise go to great lengths to make dynamic use of primary storage as transparent as possible.

Finally, you need to be able to access all that boring data in a many different ways. Say you have a situation that you naturally think of as a hierarchy, e.g. a company with multiple departments with each department having a number of employees and each employee having far to many tasks to do.

In object-oriented thinking, you would model this as a list of departments, each with some attributes and a list of employees, each again with a list of tasks. Want to show the org chart in the company? Easy, just loop through the departments and print the list of employees one by one. Super efficient. Assign a task to an employee? Easy, just add it to his list. Super efficient. But what if you want to know who's task it is to empty the bin? Then you have to wade through two outer loops plus lots of unrelated stuff, just to get to the actual data you're interested in.

Relational database systems, while solving the secondary storage problem, have an answer to this problem, and it's simple and elegant.

Then never mind that getting data in and out of them is hugely painful, especially hierarchical data, even if you're not trying to force everything leaving the database into a class.

I've recently been working a lot with Django's object-relational mapper, the ORM (which BTW means "worm" in Danish), and it really takes the pain out of the situation since it obviates the need for writing SQL strings in favour of native, albeit somewhat weird, Python code.

However, I still wonder why true object-oriented databases haven't overtaken the world ages ago. One that I hand over my object to and later can get it back from, in one piece with no fuss.

It turns out that there's a pretty large open-source one available, db4o. I think it solves the problem of retrieving objects based on arbitrary attribute values by incorporating a special query system where it's possible to define indexes like for a relational database, but I'll have to do some more reading.

There's a smaller one, DyBASE, for Python (and Ruby and PHP). It could be interesting to compare the performance against Django's ORM + MySQL.

The interesting question is of course whether you need a relational database if you can have the simpler object-oriented one. Interoperability issues aside, some people apparently seem to believe that the relational model is crucial for performance with huge amounts of data.

It would seem to me, though, that the natural partitions which the object-oriented paradigm splits the data into would tend to make things faster since there's less data to search through. For instance, if you're only concerning yourself with data from one department in the company, then it's rather wasteful to work with the maybe ten times larger table containing data for all departments. Perhaps it has something to do with joins and restrictions being difficult to express simultaneously in one query with the object-oriented approach. There are some benchmarks on www.polepos.org although they are not exactly easy to interpret at a glance.

Other people, like this guy on Slashdot, seem to think that the abstract relational model makes for better databases as opposed to application-specific persistent storages.

This makes me wonder whether many, perhaps most, uses of a relational database today wouldn't be better served by a simple object-oriented application-specific data store. I think most run-of-the-mill web sites need easy object persistence far more than terabyte scalability or an enterprise-wide database. Leave the heavy artillery for when it's really required.

Then, on the other hand, if you can get easy object persistence with Django's ORM, maybe this is a moot point.

Thursday, September 6, 2007

Django-rendertext 0.1 is out

We're releasing the first version of rendertext, a Django app for making use of custom fonts on a web page!

It's designed to work with the built-in Django template system by providing a simple template filter you can use to render images dynamically.

The first showcase is Talent Nord. I hope to get a longer list as more people start using it. The design of Talent Nord is by Fata Morgana, house of two intelligent designers.

I think you need a pretty recent Django for the app to work. Here's the code that does the hard work:

import md5, os
import Image, ImageFont, ImageDraw, ImageColor
from django.conf import settings

def render(text, fontalias, size = 12, color = "#000", rotation = 0):
"""Construct image from text.

Returns a tuple with the image file path, width and height. Takes
the parameters 'fontalias', 'size' (default 12), 'color' (default
black) as CSS hex string, and 'rotation' (default 0) which is the
degrees to rotate the text counter-clockwise. The constructed
image is stored in a cache under the media root in the
RENDERTEXT_DIR set in settings.py (default is 'rendertext/').

The font file to use is determined by mapping the fontalias
through the RENDERTEXT_FONTMAP setting, a dictionary from
fontalias strings to font file paths, e.g. {'verdana':
'/home/django/fonts/verdana.ttf'}."""

# get settings
render_dir = "rendertext/"
if hasattr(settings, "RENDERTEXT_DIR"):
render_dir = settings.RENDERTEXT_DIR

fontmap = settings.RENDERTEXT_FONTMAP

fontfile = fontmap[fontalias]
info = "|".join([text, fontalias, str(size), str(color), str(rotation)])
name = md5.new(info.encode('utf-8')).hexdigest()
filepath = render_dir + name + ".png"

dim = (-1, -1)
if not os.access(settings.MEDIA_ROOT + filepath, os.F_OK):
# construct the image
imf = ImageFont.truetype(fontfile, size)
dim = imf.getsize(text)
im = Image.new("RGBA", dim, ImageColor.getrgb(color) + (0,))

draw = ImageDraw.Draw(im)
weird_font_renderer_fix = " "
draw.text((0, 0), text + weird_font_renderer_fix, font=imf, fill=color)

if rotation != 0:
im = im.rotate(rotation, Image.BICUBIC, True)
dim = im.size

if not os.access(settings.MEDIA_ROOT + render_dir, os.F_OK):
os.makedirs(settings.MEDIA_ROOT + render_dir)

im.save(settings.MEDIA_ROOT + filepath, "PNG")
else:
# read width and height
im = Image.open(settings.MEDIA_ROOT + filepath)
dim = im.size

return (settings.MEDIA_URL + filepath, dim[0], dim[1])


def render_tag(text, *args, **kwargs):
"""Construct image tag from text."""
(path, width, height) = render(text, *args, **kwargs)
if not path:
return ""
else:
dim = ""
if width > 0 and height > 0:
dim = 'width="%s" height="%s" ' % (width, height)

return '<img src="%s" alt="%s" %s/>' % (path, text, dim)