Tuesday, November 9, 2010

Bulk inserting Django objects

I've added another Django utility member to my garden of Python, bulkops.py.

Currently, it consists of two functions, insert_many and update_many. They are intended to solve the problem of inserting a big bunch of objects in Django. If you use the standard save() or create() methods in Django, you get one SQL query per object. With thousands of objects, you'll simply drown in overhead in executing all these queries.

There are some alternatives out there, but unfortunately they are pretty complex and specific to a previous version of Django. Instead this code simple makes a list of tuples with the fields on the model and feeds that to the executemany() function in the DB API. Usage is really simple
for x in seq:
o = SomeObject()
o.foo = x
o.save()
becomes
l = []
for x in seq:
o = SomeObject()
o.foo = x
l.append(o)
insert_many(l)
It's tested to work with Django 1.2 and doesn't really touch the ORM internals so should be safe for at least some future versions.

Sadly, there's no support for associated many-to-many relationships. I actually needed that myself, but I don't think there's an efficient way of getting the primary ids when you do a bulk insert with the Python DB driver API. So one has to use a select query anyway afterwards, and then it was actually more convenient to do it by hand by rearranging the logic in the caller.

11 comments:

  1. Very nice utility functions, tested it out and it does make a big difference in performance. Thanks!
    And what I like about it is that it doesn't really mess with ORM internals.

    Only small issue I ran into was with column names, they should be quoted with con.ops.quote_name() to avoid issues in some cases (for ex. column named 'order').

    ReplyDelete
  2. Ah, yes, the "order" bug. Thanks for reporting it, I've fixed it now.

    I did profile the code, and if I recall correctly, most of the remaining Python time was spent somewhere inside the ORM prepare-for-save code; unfortunately, it doesn't seem the Django core developers have cared a lot about performance.

    ReplyDelete
  3. This is nice. Unfortunately it doesn't work if your model uses inheritance. Any easy solutions for that?

    ReplyDelete
  4. If you've got a one-object-in-two-tables inheritance, I think it requires a bit of refactoring of the code. If it's an abstract class that doesn't work, it's probably just a bug in the code.

    ReplyDelete
  5. Hi! I've been using both functions and the difference in speed is amazing! Thank you!

    I've ran into a small problem though, while using django 1.3.1 and sqlite 3.7.8: inserts and updates were running without any errors, but no data were changed on the DB. After googleing on this issue, I found out that transactions were guilty. For reference, the problem is described here:

    http://groups.google.com/group/django-users/browse_thread/thread/7992f7a36d728ed4

    So I've created a patch that corrects this issue and gives the user exception handling for free. The patch is available here:

    http://pastebin.com/sQ9yizkb

    ReplyDelete
  6. Lucas: I had a look at your patch (they are by the way a bit easier to read if you make them with diff -u), and it seems you have simply added transaction block manually?

    I think that's orthogonal to this function, conceivably you may want bundle a call to this function with some other stuff. So unless I've misunderstood something, I think I'll just let it stay the way it is.

    ReplyDelete
  7. Right. I've changed transaction control to manual, before explicitly sending commit/rollback.

    I agree with you in that it is not a problem of the function itself, as the issue is related to the use of cursors. On the other hand, it may be useful for someone that is not using cursors in her/his app, since a call to 'insert_many' silently fails.

    So I'll leave this just for future reference and in 'diff -u' format:

    http://pastebin.com/sEeihY3h

    Thanks again for this! It's been really useful!

    ReplyDelete
  8. @Lucas, @Ole: I'm using django 1.3 here, and I was also bit by the lack-of-commit and not getting anything efectively inserted/updated. Some change should be made, either to the example code on this post (because .save() takes care of calling conn.commit_unless_managed(using), so this call would be missing), or in the code, adding explicit transaction management.

    @Ole, thanks a lot for your code!

    ReplyDelete
  9. Is this superseded by bulk_create() ?

    ReplyDelete
  10. Yes, more or less, although bulkops.py also has an update_many() which I don't think has an equivalent in Django yet. You can do an SomeThing.objects.update(foo=bar) efficiently in Django but that sets the same value on all objects, as far as I know you cannot easily update many objects to different values efficiently.

    ReplyDelete
  11. This situation continues today. Coach advised me to be a good break from badminton, etc. back to a certain extent, come back.After a week on Sunday, I used to collate those shirts. Usually did not pay attention, short-sleeved light, nearly 10 sets. However, has been wearing on the golf course is that pieces of navy blue sweatshirt Mizuno, replacement was another white Mizuno. Wearing a navy blue has been 5 years, white has more than 3 years. Other rare soccer jerseys was brought to the playground, the new head seems full. Compared with them, dark blue wool sweater edges, and thin material, and color fading. Touch my hand on it, but I feel the heartbeat of intense exercise! Yes ah, do I can not do?Since childhood, I love sports. Secondary School, was elected to the school basketball team and track and field team. 1 meter 78 tall, weighed only 108 pounds. Games at a school, students compete for the high jump with another title. He is the school cheap soccer jerseys record holder, and I is the challenger. Bar centimeter by centimeter rise, fewer and fewer players around the venue. Very long time, and finally, the high jump as the last item before the closing ceremony.
    All the students divided into two groups, the playground is full of cheer sound. I saw his eyes from word: "conservative"! He wanted to ensure results are in front of his free jumping competition until only two people, he played the first jump, a one-off shot! I was twice at that height over bar. Further increase in 1 cm, he is still free jump, I tried to jump over the bar until the third attempt. Added a one centimeter. He could not jump free! A failure. I also failed. The second time, we all failed! Before the third jump, I could see him tense, he moves a bit stiff, not too! This means that I get the title. However, I launched the new wholesale football shirts high impact, though failed, I and my classmates was very happy.Memories of my drip campaign, there is always accompanied by the figure sweater. From basketball to athletics, from swimming to badminton, which jersey sweat soaked my witness my growth. Take care of all equipment by the two groups of coaches wholesale soccer jerseys increased to three groups. The most grueling riding electric bikes in front of my project done, it is unable to complete a full course, down at the half on the back. Whether in running, in the training forces, in the step cycle, the dark blue sweater with me he moved to venues, it is so comfortable and natural to close to me, so I will firmly maintain the feeling of movement.

    ReplyDelete