Software Development

Mythical RDBMS Features

Alex Bunardzic wrote an article on database integrity in which he completely dismissed commonly accepted methods of using an RDBMS to enforce data integrity. I strongly disagree with his conclusions. In fact, I think his arguments are terrible.

The argument against surrogate keys (for example) is based on a naive and misguided understanding of how surrogate keys and unique constraints should be used to enforce data integrity. In the worst case, a surrogate key by itself allows multiple rows to contain the same data (duplication). However, these duplicate records can be merged into a single record if necessary, thus providing a feasible cleanup route. The implied alternative (data integrity enforced by other application layers) only leads to worse problems–at best reinvention of the wheel.

The database is the last line of defense against data corruption. Well written applications will handle integrity violations in a graceful way that shields the user from nasty database exceptions. However, even if the application framework/code (i.e. Rails) doesn’t catch it, at least most major data integrity errors will be caught if database constraints are used.

However, I do agree that many RDBMS engines today (Postgres, MySQL, MSSQL) are not good enough to justify storing large amounts of code in them. I have some serious issues with code stored in the database (i.e. stored procedures and functions). The following list of those issues with implied feature ideas for a better RDBMS:

  • Lack of expressiveness I hate T-SQL! (aside: Postgres does allow the use of Python to write functions, although I haven’t tried it)
  • Lack of version control There’s no way to know if an entity has been modified, and no way to roll back if it gets messed up
  • Lack of good editors Sure anyone can copy the code into his favorite editor and then copy it back when finished, but then there’s the whole problem of maintaining a local copy in case the system crashes while editing. I’d like to be able to check out a local copy, make my changes (clicking the “Save” button as often as I want) with whatever editor I choose, and then commit them back–this ties into the previous version control issue.

These issues have caused me to adopt a hybrid approach to using the database to enforce constraints. I maintain my DDL in a file in version control–adding referential integrity and unique constraints as needed (after all, that’s what an RDBMS is made for). I tend to use surrogate keys in most cases because ORM frameworks (Hibernate and SQLObject) like and/or require them in most cases. When updates are needed on a live database (where the DDL cannot be applied directly), I write update scripts and commit them to version control as well. Finally, I avoid stored procedures and functions except where necessary for security or performance optimizations. This minimizes the number of times that the database needs to be updated, and consequently provides a simpler upgrade path when releasing new versions.

Design tradeoff: Tier Pressure and Isolationism

This article from The Code Project describes a maintainability tradeoff that is realized when any logical layer in a multi-tier application is allowed to access any other layer. In my experience, this tradeoff is often justified by ease of implementation and getting the job done as quickly as possible. The hidden cost of the decision is deferred until the application must be extended. If the extension is large or complex enough it is often easier to rewrite a large section of- or the entire application at that future time. And of course that, in the words of Joel, is “the single worst strategic mistake that any software company can make.”