I came across this SQLAlchemy hack through sheer ignorance. It's been so useful to me that I'm going to have to be ignorant more often.
In the course of writing my first web application for CPCC, I, in my inexperience and ignorance, created data types in my application, and the associated database tables, for things that I thought would live in my database.
As it turns out, I they were things that were actually stored in a mythical external database I don't have access to yet. All well and good, except I wrote well over a thousand lines of code assuming that everything would be in one database. I had code like so (object names changed to prevent the identity of the application, so I can pretend it's really cool and secret, while in reality it's pretty mundane (but still cool to code!)):
- genie = classes.Genie.get_by(name = str(genie_name))
- tall = classes.BottleType.get_by(name = "Tall")
- bottle = classes.Bottle(type = tall)
- genie.bottle = bottle
- genie.save()
- mappers.ctx.current.flush()
This was all well and good when Genie and Bottle where stored in the same database. All you needed was a SQLAlchemy mapping, like so (non-working):
- class Bottle(object): pass
- class Genie(object): pass
-
-
- bottle_t = sa.Table(
- "gen_bottle", meta,
- sa.Column("id", sa.Integer, primary_key = True),
- # Whatever other cols, like type_id, name, mod_time, etc
- )
-
- genie_t = sa.Table(
- "gen_genie", meta,
- sa.Column("id", sa.Integer, primary_key = True),
- sa.Column("bottle_id", sa.Integer, sa.ForeignKey("gen_bottle.id")),
- )
-
-
- assign_mapper(ctx, Bottle, bottle_t)
-
- assign_mapper(ctx, Genie, genie_t, properties = {
- 'bottle' : sa.relation(Bottle, backref = "genies"),
- })
And that was how I did my application. But after using genie.bottle everywhere in my code, I showed the code to others, and I found out that Bottle is actually something pulled from another database.
Whoops.
So I get ready to go everywhere in my code and change all the instances of the first example to this:
- genie = classes.Genie.get_by(name = str(genie_name))
- tall = classes.BottleType.get_by(name = "Tall")
- bottle = classes.Bottle(type = tall)
- genie.bottle_id = bottle.id
- genie.save()
- mappers.ctx.current.flush()
Then all I have to do is change the mapping to not use a property, and not to use a foreign key, and I'm all set.
I happened to be on the SQLAlchemy IRC channel, and asked, idly, if I
could have foreign keys across databases.
Sounds crazy even to
me, although I don't actually mean foreign keys. What I really mean is
an SQLAlchemy mapping of ids to objects across databases. Someone
responds after a few crucial minutes with an unequivocal answer: No.
Good thing I already had a working test case at that point, or I would have given up!
Over the next few hours, and with a lot of help from the people much more knowledgeable than myself on #sqlalchemy (thanks to everyone there!), we figured out what was going on and what it would take to move it from "hack" to "feature" status.
As it turns out, SQLAlchemy trusts you when you say there is a foreign key, even if there isn't. Which means we can do the following:
- bottle_t = sa.Table(
- "gen_bottle", <b>meta_for_external_db</b>,
- sa.Column("id", sa.Integer, primary_key = True),
- # Whatever other cols, like type_id, name, mod_time, etc
- )
-
- genie_t = sa.Table(
- "gen_genie", meta,
- sa.Column("id", sa.Integer, primary_key = True),
- sa.Column("bottle_id", sa.Integer, sa.ForeignKey(bottle_t.c.id)),
- )
Not only will it accept and run this code, but it'll issue SQL statements, correctly select each table from the database indicated by the metadata, and then map the objects - from separate databases - to each other's attributes, just as we had before.
In fact, the only obstacle to this is database-level referential integrity. After all, the gen_genie.bottle_id column relates to a gen_bottle.id column that doesn't even exist. The solution? Don't use a DB with referential integrity! MySQL's MyISAM engine and SQLite work perfectly with a similar test case. InnoDB and PostgreSQL will fail.
It's very much a hack; if SQLAlchemy ever starts checking to make sure that the objects it's relating have the same metadata object, or that the metadata objects have the same engine, then the game is over. However, it's such a useful "feature" that I'm trying to learn enough of the SQLAlchmey internals to submit a patch that makes it non-hacky and an 'official' use case.
If you're already using a database with referential integrity (which is a good idea for other reasons), not all is lost. As long as you've created the table structure before hand, leaving out the FOREIGN KEY statements, SQLAlchemy will be happy to treat the databases as related, since you still specify an sa.ForeignKey; just don't issue meta.create_all() - make sure that's done beforehand.
Comments
129 spam comments omitted.
I am no longer accepting new comments.
Adam Gomaa
#1236, 2007-12-15T08:17:06Z
Apparently, this is actually called "sharding", and SQLAlchemy does support it natively, at least in 0.4. Check the Session docs for Twophase.