Archive for the ‘Software Development’ Topic

A better git stash

Jon of saintsjd recently wrote a piece on improving the git user interface. This got me thinking about one of my git peeves.

git stash needs another sub-command. Often I want to stash ALL unstaged changes, including new (not yet added) files. This is almost possible with git stash -k, except it does not stash new files, which is very important.

Enter git stash unstaged. So now I’ve stashed the unstaged changes. That is, I’ve stashed all the changes I do not want included in the next commit. I can now run my tests as I always do before I commit… That’s what you do too, right? Oops, the tests fail because I forgot to add a file, which now happens to be in the stash. At this point I do not know of a way to get my stashed changes back exactly as they were before I stashed them, that is, unstaged. It’s really bad if there are some files that are partially staged (i.e., only certain hunks were staged), because they will create conflicts if I attempt to apply the stash right now. This should be possible. There should be a way to get things back exactly as they were before I stashed them. The missing command is git stash pop unstaged, which applies all of the changes from the stash and puts them in the unstaged (off-stage? pre-stage?) area, without conflicts.

In summary:

  • git stash unstaged – stash all changes that are not currently staged, including new and removed files
  • git stash pop unstaged – pop the top stash as unstaged changes

This would greatly improve my workflow when I get over-zealous and start changing more than I want to commit at one time (happens all the time). It would make it much easier to isolate those changes into individual, logical commits.

SQLAlchemy-Migrate vs Alembic

I made this list while evaluating schema migration tools for SQLAlchemy (currently at 0.7.3) with PostgreSQL and SQLite. This is a very high-level overview of my findings specifically related to the project I’m currently working on. I do not have experience with either SQLAlchemy-Migrate or Alembic, although I do have experience with Django South, so I’m familiar with the issues of schema branching, etc. At the time of this writing I have a working (patched) SQLAlchemy-Migrate 0.7.2 system, but did not yet try to get Alembic (0.1alphadev) up and running, mostly due to lack of documentation.

 

SQLAlchemy-Migrate pros:

  • nice documentation
  • seems to be more widely used in the community
  • has (experimental) schema/model comparison tools
  • maybe approximates non-linear (but ordered) versioning with timestamped scripts?
  • more development activity

SQLAlchemy-Migrate cons:

  • requires a patch to get make_update_script_for_model working with (SA 0.7.3?) TypeDecorator

 

Alembic pros:

  • may be simpler than SQLAlchemy-Migrate
  • written and recommended by Mike Bayer
  • non-linear versioning (looks nice, but do I really need it?)

Alembic cons:

  • no documentation (yet) documentation was published within a day of writing this… Mike Bayer is amazing.
  • no SQLite ALTER support, must build own dump/import system for SQLite
  • lacks schema/model comparison tools

 

Takeaway: Alembic looks promising, and maybe it will be the right choice sometime in the future. The lack of documentation and no support for SQLite ALTER makes it less appealing to me at this point.

The schema/model comparison features look like they could be very handy if they are reliable. They are currently experimental (and need a patch) in SQLAlchemy-Migrate, but non-existent in Alembic.

Verbose is not Explicit

This is a response to Verbosity published at lesscode.org

From my big fat “compact” Random House dictionary:

verbosity: the state or quality of being verbose; superfluity of words, wordiness
Syn: …redundancy, turgidity

and

verbose: characterized by the use of many or too many words; wordy
Syn: …tedious, inflated, turgid…

In fact, my dictionary is verbose, and come to think of it, programmers (and lawyers) tend to be verbose as well. We need to qualify every statement (as if you didn’t already know it), to make sure everyone gets our point exactly as we intended it to be interpreted, so there is zero ambiguity. This tendency toward verbosity can have the effect of causing our readers to fall asleep… WAKE UP! I’m not finished :)

Some programming languages are more verbose than others. As Matt Good pointed out, Java is more verbose than Python. That type of verbosity is a bad thing for readability. It’s redundant, tedious, inflated. It causes me to go to sleep. It slows me down. It’s tiring.

The confusion is between verbosity and explicitness. Again from Random House:

explicit: fully and clearly expressed or demonstrated; leaving nothing merely implied…
Syn: express, definite, precise, exact, unambiguous…

Explicitness is when I come right out and say it, rather than doing it with black magic behind the scenes. Verbosity, on the other hand, is extra, unneeded, superfluous verbiage–more than is required. As programmers, we should be striving to DO NO MORE THAN IS REQUIRED. Verbosity is bad, explicitness is good.

Alex quoted “the other fellow” as saying (emphasis added):

And verbosity makes code easier to read. I donĂ­t have to know anything to understand the code. Everything I need to know is there in front of me.

That’s just plain wrong. Especially if he meant to say “verbosity”, but I think he was confused and actually meant to say “explicitness”. Verbosity does not guarantee that “everything I need to know is right there”. Instead, it usually means that there is more there than I need, which makes it harder to find what I want. Explicitness, by definition, means that everything is there: “leaving nothing merely implied”.

Mythical RDBMS Features

Alex Bunardzic wrote an article on database integrity in which he completely dismissed commonly accepted methods of using an RDBMS to enforce data integrity. I strongly disagree with his conclusions. In fact, I think his arguments are terrible.

The argument against surrogate keys (for example) is based on a naive and misguided understanding of how surrogate keys and unique constraints should be used to enforce data integrity. In the worst case, a surrogate key by itself allows multiple rows to contain the same data (duplication). However, these duplicate records can be merged into a single record if necessary, thus providing a feasible cleanup route. The implied alternative (data integrity enforced by other application layers) only leads to worse problems–at best reinvention of the wheel.

The database is the last line of defense against data corruption. Well written applications will handle integrity violations in a graceful way that shields the user from nasty database exceptions. However, even if the application framework/code (i.e. Rails) doesn’t catch it, at least most major data integrity errors will be caught if database constraints are used.

However, I do agree that many RDBMS engines today (Postgres, MySQL, MSSQL) are not good enough to justify storing large amounts of code in them. I have some serious issues with code stored in the database (i.e. stored procedures and functions). The following list of those issues with implied feature ideas for a better RDBMS:

  • Lack of expressiveness I hate T-SQL! (aside: Postgres does allow the use of Python to write functions, although I haven’t tried it)
  • Lack of version control There’s no way to know if an entity has been modified, and no way to roll back if it gets messed up
  • Lack of good editors Sure anyone can copy the code into his favorite editor and then copy it back when finished, but then there’s the whole problem of maintaining a local copy in case the system crashes while editing. I’d like to be able to check out a local copy, make my changes (clicking the “Save” button as often as I want) with whatever editor I choose, and then commit them back–this ties into the previous version control issue.

These issues have caused me to adopt a hybrid approach to using the database to enforce constraints. I maintain my DDL in a file in version control–adding referential integrity and unique constraints as needed (after all, that’s what an RDBMS is made for). I tend to use surrogate keys in most cases because ORM frameworks (Hibernate and SQLObject) like and/or require them in most cases. When updates are needed on a live database (where the DDL cannot be applied directly), I write update scripts and commit them to version control as well. Finally, I avoid stored procedures and functions except where necessary for security or performance optimizations. This minimizes the number of times that the database needs to be updated, and consequently provides a simpler upgrade path when releasing new versions.

Design tradeoff: Tier Pressure and Isolationism

This article from The Code Project describes a maintainability tradeoff that is realized when any logical layer in a multi-tier application is allowed to access any other layer. In my experience, this tradeoff is often justified by ease of implementation and getting the job done as quickly as possible. The hidden cost of the decision is deferred until the application must be extended. If the extension is large or complex enough it is often easier to rewrite a large section of- or the entire application at that future time. And of course that, in the words of Joel, is “the single worst strategic mistake that any software company can make.”