I ended the last post with a discussion of the fundamental problem with Django's Object-Relational Mapper, namely, that it is developed for and by Django, which means that it will always lack the flexibility to create boundary-pushing models effectively. When working at a high level on simple problems - such as this blog, or most web applications - the Django model language and API will suffice, but when trying to stretch the definitions of ORM, you simply need the flexibility that comes with a well-designed library like SQLAlchemy.

This post is just meant to go more in-depth on that issue, and then to map out what is needed to solve this problem.

Defining The Problem

I want to explain very clearly what I see as the problem. This has dual purposes. One, of course, is to define the problem for everyone else; that helps tremendously when making an argument. The second propose is to make sure I really understand it myself. I've said it before, and someone slap me if I stop saying it: I don't know what I'm doing. So, I feel the need to spell this out, if for no other reason than to make sure I'm not deluding myself.

Django's ORM has a very nice interface when defining models. Like most things in Django, it's well-documented, intuitive, and idiomatic.

But while the interface allows you to write "green" code, within the django.db package there is a fair amount of "yellow and red" code.

That's actually OK, for the vast majority of applications. You don't need to peek inside the database package when you're writing Generic Blog Engine or Yet Another Wiki. What frustrates me here is that the web has so much more potential - and Django is very close to being able to push us there.

The next Google won't be writing blog engines or wikis - they'll be, like Google did, manipulating data in ways others couldn't imagine at the time. Google makes searching so intuitive that it's easy to forget how complex the problem is. Google didn't succeed by building a piece of generic software; they succeeded by getting smart people to do innovative things with data.

Unfortunately, Django's ORM is only sufficient for current web applications , and thus will be insufficient as the base of anything that pushes the limits. Again, and I can't stress this enough: it doesn't need to be able to push the limits of available technology to be the clear winner of the Python web framework world.1 But by adopting a fundamentally well-written library like SQLAlchemy, Django can make itself that much closer to being the Next Big Web Framework, which will exceed all current frameworks in simplicity - which, by and large, it already does - and power.

So what exactly is wrong with the django.db package? If you've ever thought about trying to replicate the django.db.models.Model interface in SQLAlchemy, you've probably spent a long time poking around in there, and you've found code like this:

  1. # django/db/models/fields/__init__.py
  2. # A method of the "Field" base object
  3. def prepare_field_objs_and_params(self, manipulator, name_prefix):
  4. params = {'validator_list': self.validator_list[:]}
  5. if self.max_length and not self.choices: # Don't give SelectFields a max_length parameter.
  6. params['max_length'] = self.max_length
  7. if self.choices:
  8. if self.radio_admin:
  9. field_objs = [oldforms.RadioSelectField]
  10. params['ul_class'] = get_ul_class(self.radio_admin)
  11. else:
  12. field_objs = [oldforms.SelectField]
  13. params['choices'] = self.get_choices_default()
  14. else:
  15. field_objs = self.get_manipulator_field_objs()
  16. return (field_objs, params)

I dunno about you, but that looks like "Yellow" code to me. Specifically, the coupling to the admin interface layer troubles me. In particular, this line:

  1. params['ul_class'] = get_ul_class(self.radio_admin)

Please, please, please tell me that's not actually defining classes for an HTML unordered list. Your database library isn't quite the right place to have code for your user interface HTML generation. The admin interface is one of Django's big features, but it is tightly coupled to the database library.

And ultimately, that's bad in the long run. It means that, for example, getting Django to work with multiple databases is much more complex than it should be, because it had to be added on in a later branch - rather than being one of the core features like it is in SQLAlchemy.

So is Django's database API "good enough?" It's common to say - and I have said this myself - that the Django ORM "works for me". When writing Generic Web Applications - which is what most of us are doing most of the time - it is. But let's take something you can do with Django that other frameworks have by and large been unsuccessful in replicating - the Admin interface. If you wanted to build an application with the level of introspection and runtime resolution that the Admin interface has - you couldn't do it with the current ORM. You would need to edit the Django source and couple the ORM to your application as tightly as django.db is bound coupled to the Admin interface today.

And that's just no good.

The SQLAlchemy/Django Interface Today

If you're willing to forgo some features, like the admin interface, generic views, and basically anything tied into the Django ORM, you can already use SQLAlchemy with Django. I am doing it now on a few projects (although not this blog). I use plain-jane SQLAlchemy code, and frankly it's pretty crude. SQLAlchemy code isn't designed to be pretty; it's designed to be a fairly low-level mapper between Python and your RDBMS, and it does a very good job at that.

Unfortunately, that's a different goal than what an ORM for use by framework users would have. Django's ORM isn't supposed to handle the RDBMS-OOP impedance mismatch; it's to create a Pythonic interface for programmers to define models, then to massage it into the persistence layer that is an RDBMS, naturally or not, hackish or not.2

So now I'm back to where we were at the end of the last entry: What we need is an interface for SQLAlchemy that is as good as Django's. What is needed is a declarative layer for SQLAlchemy.

Elixir as a Declarative Layer

Elixir is, according to it's web site, "a declarative layer on top of SQLAlchemy." Cool! So someone's already done it. Looks like SQLAlchemy's on it's way to world domination!

  1. class Person(Entity):
  2. has_field('name', Unicode(255))
  3. has_many('addresses', of_kind='Address')
  4. class Address(Entity):
  5. has_field('email', String(128))
  6. belongs_to('person', of_kind='Person')

... or, maybe not. For any readers who don't speak Python, that's just the Wrong Way of doing this.

And I don't get why that way was chosen. TurboEntity, one of the predecessors of Elixir, had a syntax much closer to Django's, and was better for it. Let's compare some functionally-equivalent3 pseudo-Django model code.

  1. class Person(models.Model):
  2. name = models.CharField(maxlength=255)
  3. class Address(models.Model):
  4. email = models.CharField(maxlength=128)
  5. person = models.ForeignKey(Person, related_name="addresses")

Honestly, there's no comparison. The second one is Python; the first one is an emulation of Rails in Python. And when you try to write Ruby in Python, it doesn't end well.

I'm apparently not the first to have this concern. Now, according to the Elixir folks, the first example is actually better because it's a DSL. I might be a little behind on my LtU feed, but I thought that a DSL was used to make things clearer and more intuitive, not awkward.

Django doesn't don't need another DSL modelled after Rails; Elixir isn't the right fit. What Django's ORM needs is an interface similar to what they have already - because it's extremely well done, not for compatibility reasons - and a layer that impedance-matches those Python objects to SQLAlchemy behind the scenes.

Back in May, the deficiencies of the DSL syntax came up on the mailing list, and the final conclusion was to "steer clear of the discussion about chucking the DSL-style out the window for now".

Okay. How about Now, then?

One of the great things about Object-Oriented design is that it's very simple. The last thing you need is a DSL to describe the relations between objects. There are simply so few! One-to-one, foreign keys, many-to-many... each one has a well-defined and easy-to-implement database mapping. My approach to that problem, when writing SQLAlchemy models with Django, was to write helper functions that handled those mappings for me. Here are a couple:

  1. def simple_table(appcode, obj_name, *columns):
  2. """Get rid of most of the repeated cruft in each table def.
  3. Defines an ID PK column, along with creating the table name.
  4. """
  5. table = sa.Table(
  6. tbl_name(appcode, obj_name), meta,
  7. sa.Column("id", sa.Integer, primary_key = True),
  8. *columns
  9. )
  10. return table
  11. def many_to_many(first_t, second_t, appcode = None):
  12. """
  13. Return a many-to-many linking table between tables first_t and
  14. second_t.
  15. Appcode has to be explicitly specified if the appcode for the
  16. two columns are not the same; if not passed in, it assumes
  17. that it should use the first table's appcode.
  18. """
  19. # Determine the appcode
  20. if not appcode:
  21. appcode = first_t.appcode
  22. # Make sure to have obj_from_name here.
  23. table_names = "%s_%s" % (obj_from_name(first_t.name),
  24. obj_from_name(second_t.name))
  25. return sa.Table(
  26. tbl_name(appcode, table_names), meta,
  27. sa.Column("%s_id" % obj_from_name(first_t.name), sa.Integer,
  28. sa.ForeignKey(first_t.c.id)),
  29. sa.Column("%s_id" % obj_from_name(second_t.name), sa.Integer,
  30. sa.ForeignKey(second_t.c.id)),
  31. )

Let's note something just as important: I did not set up a new DSL paradigm. I used existing Object-Oriented abstractions, and ended up with an interface from Python that was natural to the point of being predictable:

  1. user_t = util.simple_table(
  2. __appcode__, "user",
  3. ## Rest of my SQLa columns for the User object
  4. )
  5. permission_t = util.simple_table(
  6. __appcode__, "permission",
  7. # Rest of my SQLa columns for the Permission object
  8. )
  9. user_permission_t = util.many_to_many(
  10. user_t, permission_t)

Anyway, the whole point of this is that a Rails-style DSL is not the way to wrap SQLAlchemy.

Going Back: TurboEntity

Now, where were we? Right, finding a Pythonic syntax for defining database-mapped models.

  • Elixir: Too Railsey, not OOPy.
  • ActiveMapper: Like Elixir, but superseded by Elixir,
  • TurboEntity: More OOPy, but also deprecated in favor of Elixir.

Sigh...

I downloaded the TurboEntity source and poked through it. (Just one file! I was kind of impressed.) The TurboEntity syntax was closer to what I was looking for:

  1. from turboentity import *
  2. class Movie (Entity):
  3. title = Column(Unicode(50))
  4. year = Column(Integer)
  5. actors = ManyToMany("Actor", order_by="name")
  6. genre = ManyToOne("Genre")
  7. class Actor (Entity):
  8. name = Column(Unicode(60))
  9. movies = ManyToMany("Movie")
  10. class Genre (Entity):
  11. title = Column(Unicode(30))
  12. movies = OneToMany("Movie", order_by="-year")

But it wasn't perfect. First, it depended on SQLAlchemy 0.2. Seeing as \0.4 will hopefully soon be released, I didn't expect it to work out-of-the-box. Also, I greatly preferred relating actual classes, rather than relating by strings and having the module do (getattr(module, "string") for module in sys.modules) stuff, which scares me and leads to tough bugs.

So, I downloaded SQLAlchemy 0.4 and the last release of TurboEntity, and luckily the problems weren't too severe. The internal design of TurboEntity was clear enough that I was able to modify it to use classes instead of strings fairly easily; I had it how I wanted it within a few hours.

A New Solution: ISDL

But, TurboEntity is deprecated, and Elixir has adopted the Rails-style syntax. This means that, as of now, there is no object-oriented-interface declarative mapper for SQLAlchemy.

And what is the clear thing to do? Build one myself.

I haven't actually written a line of code for this yet. I may or may not base it on the last release of TurboEntity, so it may be that there is no code whatsoever for it yet. But I have gone through the most important step: I've named the project. And the name is ISDL.

It's an acronym for the Incompatible SQLAlchemy Declarative Layer. It's a SQLAlchemy declarative layer, and it's incompatible with anything in existence, so I think it's a fitting name. Most importantly, in the event that I actually write it and start to use it with Django, the name is supposed to prevent users from thinking that it is in any way compatible with Django models, despite what will be a similar syntax and focus on Pythonic model definitions.

The entirety of information on the project is this section of this post. I'll obviously try to get more up, but it won't be until next weekend at the soonest, due to numerous tales of woe that you don't want to hear.

I'll leave you with what I want it to look like when it's done:

  1. import isdl
  2. import sqlalchemy as sa
  3. class Person(isdl.Model):
  4. name = isdl.String
  5. class Address(isdl.Model):
  6. person = isdl.ForeignKey(Person)
  7. email = isdl.String(128)

Many of you will note that it's very similar to the Django code. That's cause Django gets the model definition language right - they're just using the wrong underlying library.


  1. Yes, it's trollish of me to declare Django superior to Pylons, Zope and TurboGears (which, as Ian Bicking notes, will be based on Pylons). But this is my blog, and that's my opinion. To lessen the blow a bit, I don't think Django is superior to Zope - I just think they fit different domains. But Django is absolutely superior to Pylons ;) 

  2. A random thought that just occurred to me: Why not improve the Django ORM to have as good of a low-level interface as SQLAlchemy, instead of trying to improve the high-level of SQLAlchemy to be as good as Django? There are a few technical problems - the django.db code is fairly complex already - but the main reason is actually social: A lot of good ideas are going into Django nowadays, but few are coming out. I want to use regexps as URLs with Pylons, but the Django URLs haven't been packaged. I want to use newforms with my template language of choice, but that hasn't been packaged either. I don't want the same to happen with the theoretical great ORM that would come from Django's interface and SQLAlchemy's implementation. I would want to use that in all kinds of projects, Django-based and not, and several that wouldn't even be web-based. I don't really trust the Django devs enough to do that well; like Jinja, it would be a second-class citizen to the version in Django. 

  3. In the Django example, you could use the Models.EmailField instead of a CharField as I did. I generally don't like this proliferation of field types, though, and I think that something like email validation is better done in application code than in your database layer. But's that's a small point; the important thing is the differing syntax.