SQLAlchemy: beware of backref
Wow, it’s been a while… Sorry, I’ll try to do better in the future.
I’ve been using SQLAlchemy for quite a while now, and I’m really enjoying it. According to Michael Bayer, the main developer, it’s modeled after Hibernate which is the best ORM I’ve ever used. There are a few things that SQLAlchemy does differently from Hibernate, one of which is “backrefs”, or automatic two-way relationship management. With Hibernate this must be done manually. Here’s a short example:
mapper(Address) mapper(User, properties=dict( addresses=relation(Address, backref="user") )) user = User.get_by(id=4) address = Address.get_by(id=12) assert len(user.addresses) == 0 address.user = user assert len(user.addresses) == 1
The backref="user" causes Address objects to get a user property. And when a user is associated with an address the corresponding user.addresses collection is automatically updated.
However, there’s a subtle gotcha: setting address.user causes the user.addresses collection to be loaded. So if you’re associating a new child to a parent that has many children, then all of those children will be loaded when child.parent is set. Here’s an example:
parent = Parent.get(1) # assume this parent has 500 children in the database # since parent.children is a lazy collection they # will not be loaded until parent.children is requested child.parent = parent # this implies parent.children.append(child) # which causes all 500 of parent's children # to be loaded from the database...ouch!
That’s annoying, and it causes an extra database hit which is expensive. Luckily there’s a fairly easy workaround. Instead of using the backref feature just define each side of the relationship separately. Here’s an updated version of the original example:
addr_mapper = mapper(Address) mapper(User, properties=dict( addresses=relation(Address) )) addr_mapper.add_property("user", relation(User)) user = User.get_by(id=4) address = Address.get_by(id=12) assert len(user.addresses) == 0 address.user = user assert len(user.addresses) == 0
Notice that user.addresses is not updated when address.user is set. That’s because the two relationships are defined independently of eachother.
one thing that hasnt been documented, is that SQLAlchemy’s backrefs actually have a similarity to Hibernate, in that one side of a bi-directional relationship likes to have a flag on it to indicate that fact, exactly in the same manner as Hibernate’s ‘inverse=”true”‘ flag; except the flag is called “is_backref”. normally, when using the regular “backref” keyword option, the “is_backref” flag is set up by the Backref object that is created. but if you are doing a “manual” backref like in this example, youll want to put “is_backref=True” on the side of the bi-directional relationship that youd like “ignored”. also the issue of “cant append to a list without lazy loading its contents first” has been raised repeatedly on the mailing list; eventually i might get around to figuring something out with this one, although its a little involved (i.e., a list you can append() to, but it doesnt know about the rest of its contents…might go with hibernates “extra-lazy-loading” approach).