SQLAlchemy: beware of backref

Wow, it’s been a while… Sorry, I’ll try to do better in the future.

I’ve been using SQLAlchemy for quite a while now, and I’m really enjoying it. According to Michael Bayer, the main developer, it’s modeled after Hibernate which is the best ORM I’ve ever used. There are a few things that SQLAlchemy does differently from Hibernate, one of which is “backrefs”, or automatic two-way relationship management. With Hibernate this must be done manually. Here’s a short example:

mapper(User, properties=dict(
    addresses=relation(Address, backref="user")

user = User.get_by(id=4)
address = Address.get_by(id=12)

assert len(user.addresses) == 0

address.user = user
assert len(user.addresses) == 1

The backref="user" causes Address objects to get a user property. And when a user is associated with an address the corresponding user.addresses collection is automatically updated.

However, there’s a subtle gotcha: setting address.user causes the user.addresses collection to be loaded. So if you’re associating a new child to a parent that has many children, then all of those children will be loaded when child.parent is set. Here’s an example:

parent = Parent.get(1)
# assume this parent has 500 children in the database
# since parent.children is a lazy collection they
# will not be loaded until parent.children is requested

child.parent = parent
# this implies parent.children.append(child)
# which causes all 500 of parent's children
# to be loaded from the database...ouch!

That’s annoying, and it causes an extra database hit which is expensive. Luckily there’s a fairly easy workaround. Instead of using the backref feature just define each side of the relationship separately. Here’s an updated version of the original example:

addr_mapper = mapper(Address)
mapper(User, properties=dict(
addr_mapper.add_property("user", relation(User))

user = User.get_by(id=4)
address = Address.get_by(id=12)

assert len(user.addresses) == 0

address.user = user
assert len(user.addresses) == 0

Notice that user.addresses is not updated when address.user is set. That’s because the two relationships are defined independently of eachother.

4 Responses to “SQLAlchemy: beware of backref”

- mike bayer

one thing that hasnt been documented, is that SQLAlchemy’s backrefs actually have a similarity to Hibernate, in that one side of a bi-directional relationship likes to have a flag on it to indicate that fact, exactly in the same manner as Hibernate’s ‘inverse=”true”‘ flag; except the flag is called “is_backref”. normally, when using the regular “backref” keyword option, the “is_backref” flag is set up by the Backref object that is created. but if you are doing a “manual” backref like in this example, youll want to put “is_backref=True” on the side of the bi-directional relationship that youd like “ignored”.

also the issue of “cant append to a list without lazy loading its contents first” has been raised repeatedly on the mailing list; eventually i might get around to figuring something out with this one, although its a little involved (i.e., a list you can append() to, but it doesnt know about the rest of its contents…might go with hibernates “extra-lazy-loading” approach).

- Mack S.

Any idea if this has changed since the original posting to not auto-load all child rows on an append? It would be useful to know before altering all of my 15 or so backrefs.

- Jeff Widman

These days, this isn’t a problem because you can specify the method of loading. Using `lazy=dynamic` in your backref’s is likely what you’re looking for.

- Noumenon72

Wow, thanks for the update in the comments, I was all making note of this and did not notice the “2006” date.

Leave a Reply