RE: Peek-a-boo I CC You

In his recent post Peek-a-boo I CC You John Maeda mentioned that he had “[resolved] to never use bcc”. While I completely agree with the sentiment that the CC feature in email clients is an historical artifact that should be discontinued (both in use and implementation), I respectfully disagree with his opinion of BCC. First, let me define the meanings of the common email addressing fields as I understand them:

  • TO This message is especially meant for your eyes.
  • CC This message is not really meant for you, but you might find it informative, interesting, or just want to file it for future reference. WARNING: MAY BE IRRELEVANT
  • BCC I have protected your identity from other recipients by placing your name in the BCC field. You will not know if anyone other than yourself has been BCC’d.

First, an additional comment on the CC field. In my opinion, both TO and CC fulfill the same general role: send this email and allow each recipient to see others to whom it was sent. The CC field also carries the additional meaning of “this is not really meant for you, but you might find it interesting, etc.” This additional meaning sends an unintended message to the CC’d recipient(s): you are not important enough to be placed in the TO field. Often the sender is simply being lazy, leaving it up to the CC’d recipient(s) to glean the obviously useful information and discard the rest. On the surface this appears to save time, and therefore it seems to be justifiable. However, the CC cost is deferred not avoided. It forces each CC’d recipient to filter through possibly irrelevant email only to glean limited information (maybe not even what the original sender intended them to glean). It would be more efficient for the original sender to send a short summary directly TO those “less important people” than to force them to glean their own summary by placing them in the CC field.

Now, to my real point: the importance of BCC. The BCC field actually provides a distinct, useful feature: send this email without compromising the entire list of recipients. This is a very important feature, and if it were used everywhere it should be used there would be at least a few less opportunities for spammers to get their hands on huge lists of valid email addresses. BCC is useful when sending bulk email. I know, that triggers a red flag, but bear with me. You might be thinking that the BCC line was invented for spammers. But sending spam is an issue of politeness, not something that could be prevented by removing the BCC feature. Bulk email can be very powerful when used correctly. For example I use the BCC field when I send a newsletter to a group of supporters every two or three months. BCC is perfect for this type of legitimate correspondence.

From the original post:

The danger of bcc is getting people caught into the poisonous reply-to-all on the receiving end of a bcc. I quickly delete any bcc‘ed email I receive, and also write to the sender that I am not in the practice of bcc-ing anyone.

That sounds more like a reason to avoid CC than BCC. I don’t understand how BCC causes a problem with reply-to-all. [WARNING jargon abuse ahead] When you reply-to-all an email in which you were BCC’d, you will only reply to those who were explicitly TO’d or CC’d. Normally (when used correctly) BCC’d email is only sent TO a single address, and in that case reply-to-all is the same as plain reply.

The common implementation of BCC could be enhanced by allowing addresses in the BCC field without requiring at least one address in the TO field. Then again, that’s trivial to work around by placing the FROM address in the TO field. It would be even better to have a single TO field (no CC or BCC) and a checkbox labeled “Disclose recipeints” (checked by default). That checkbox, when unchecked, would change the meaning of the TO field to be like our current BCC field. That would cover at least 80% of the email-recipient use cases while eliminating 66% of the address fields!

ThunderBayes 0.1.1

Download the latest version from the Extension Mirror (I’m still waiting for it to be approved on Mozilla Update).

This release is a minimal update: ThunderBayes how displays a progress bar while messages are being processed.

However, I will note that I am very pleased with the results of using the extension. Before I started using ThunderBayes I got 10 to 15 spam emails in my inbox per day. Now I get one or two! All due to a simple and convenient method of training SpamBayes.

ThunderBayes 0.1

SpamBayes integration for Thunderbird

Download the extension from the Extension Mirror.

This extension was born from my frustration with Thunderbird’s built-in spam filtering features. While I enjoyed the ability to classify email as spam/ham within Thunderbird, the built-in spam filter was not effective enough to make it usable on a long-term basis. Having past experience with SpamBayes, I immediately looked for the best way to integrate the two. Unfortunately, aside from an old rumor I found nothing. So I settled for the pop3 proxy distributed with SpamBayes. While this provided good spam classification results, it left room for improvement on the training procedure. The browser-based training mechanism, while effective, was not convenient and essentially required me to classify most spam that made it to my inbox twice: once to move it to the Junk folder within Thunderbird and once to classify it as spam in the SpamBayes web-interface. Hopefully this will put an end to the frustration.

What does it do?

It provides a toolbar button similar to Thunderbird’s Junk button to classify email as Spam or Ham. Clicking the button causes two things to happen: (1) it sends the source of the selected messages to SpamBayes to be classified and (2) it optionally moves the messages to a folder of your choice (this can be configured in the extension options–it defaults to Junk on Local Folders). It actually does one more minor thing; it flips the junk status of the email. This last thing allows a single button to classify spam and ham. Give it a try, you’ll see how it works.

What it does not do:

  • It does not install SpamBayes (at this point the SpamBayes proxy must be installed and configured externally from Thunderbird). A future version of ThunderBayes may install SpamBayes automatically.
  • It does not filter mail classified by SpamBayes (Thunderbird’s powerful built-in filters can be configured to move spam classified by SpamBayes to whatever folder you choose and/or change the junk status of incoming email). Automatic filtering is planned for a future release.

Prerequisites:

  • Install and configure the SpamBayes proxy. Windows users can go to the Windows page and look at the section entitled Non Outlook Solutions.
  • Configure your email account(s) in Thunderbird to use the proxy.

Thunderbird configuration recommendations (do this for each account that uses the proxy):

  • Disable the built-in junk mail controls (uncheck “Enable adaptive junk mail detection” on Adaptive Filter tab of Junk Mail Controls dialog).
  • Create a new message filter named “ThunderBayes-spam”
    Match any of the following
    “X-SpamBayes-Classification” is “spam” (use “Customize…” to add the new header)
    Set Junk Status to Junk
    Move Message to Junk on Local Folders
    Mark As Read (if you’re feeling confident)
  • Create a new folder named “Unsure” (this is for messages classified by SpamBayes as “unsure”)
  • Create a new message filter named “ThunderBayes-unsure”
    Match any of the following
    “X-SpamBayes-Classification” is “unsure”
    Move Message to Unsure (the new folder)

Install ThunderBayes and add the “Spam” button to the toolbar. To add the button to the toolbar, right-click the toolbar and selecting “CustomizeÖ” Then drag the “Spam” icon into your toolbar and click “OK” on the “Customize Toolbar” dialog.

Send bug reports and feature requests to . Include the word “ThunderBayes” in the subject line. I will respond to messages as I deem appropriate and as time permits. Impolite or inappropriate feedback will be ignored. I reserve the right to define impolite and inappropriate in this context.

THIS SOFTWARE HAS NOT BEEN TESTED THOROUGHLY AND MAY CAUSE ANY OR ALL OF YOUR EMAIL TO DISAPPEAR WITHOUT WARNING OR GOOD REASON. ALWAYS MAKE REGULAR BACKUPS OF IMPORTANT INFORMATION. NO GUARANTEES OR WARRANTIES WHATSOEVER ARE PROVIDED WITH THIS PRODUCT. USE AT YOUR OWN RISK!

SQLAlchemy: beware of backref

Wow, it’s been a while… Sorry, I’ll try to do better in the future.

I’ve been using SQLAlchemy for quite a while now, and I’m really enjoying it. According to Michael Bayer, the main developer, it’s modeled after Hibernate which is the best ORM I’ve ever used. There are a few things that SQLAlchemy does differently from Hibernate, one of which is “backrefs”, or automatic two-way relationship management. With Hibernate this must be done manually. Here’s a short example:

mapper(Address)
mapper(User, properties=dict(
    addresses=relation(Address, backref="user")
))

user = User.get_by(id=4)
address = Address.get_by(id=12)

assert len(user.addresses) == 0

address.user = user
assert len(user.addresses) == 1

The backref="user" causes Address objects to get a user property. And when a user is associated with an address the corresponding user.addresses collection is automatically updated.

However, there’s a subtle gotcha: setting address.user causes the user.addresses collection to be loaded. So if you’re associating a new child to a parent that has many children, then all of those children will be loaded when child.parent is set. Here’s an example:

parent = Parent.get(1)
# assume this parent has 500 children in the database
# since parent.children is a lazy collection they
# will not be loaded until parent.children is requested

child.parent = parent
# this implies parent.children.append(child)
# which causes all 500 of parent's children
# to be loaded from the database...ouch!

That’s annoying, and it causes an extra database hit which is expensive. Luckily there’s a fairly easy workaround. Instead of using the backref feature just define each side of the relationship separately. Here’s an updated version of the original example:

addr_mapper = mapper(Address)
mapper(User, properties=dict(
    addresses=relation(Address)
))
addr_mapper.add_property("user", relation(User))

user = User.get_by(id=4)
address = Address.get_by(id=12)

assert len(user.addresses) == 0

address.user = user
assert len(user.addresses) == 0

Notice that user.addresses is not updated when address.user is set. That’s because the two relationships are defined independently of eachother.

Verbose is not Explicit

This is a response to Verbosity published at lesscode.org From my big fat “compact” Random House dictionary:
verbosity: the state or quality of being verbose; superfluity of words, wordiness
Syn: …redundancy, turgidity
and
verbose: characterized by the use of many or too many words; wordy
Syn: …tedious, inflated, turgid…
In fact, my dictionary is verbose, and come to think of it, programmers (and lawyers) tend to be verbose as well. We need to qualify every statement (as if you didn’t already know it), to make sure everyone gets our point exactly as we intended it to be interpreted, so there is zero ambiguity. This tendency toward verbosity can have the effect of causing our readers to fall asleep… WAKE UP! I’m not finished :) Some programming languages are more verbose than others. As Matt Good pointed out, Java is more verbose than Python. That type of verbosity is a bad thing for readability. It’s redundant, tedious, inflated. It causes me to go to sleep. It slows me down. It’s tiring. The confusion is between verbosity and explicitness. Again from Random House:
explicit: fully and clearly expressed or demonstrated; leaving nothing merely implied…
Syn: express, definite, precise, exact, unambiguous…
Explicitness is when I come right out and say it, rather than doing it with black magic behind the scenes. Verbosity, on the other hand, is extra, unneeded, superfluous verbiage–more than is required. As programmers, we should be striving to DO NO MORE THAN IS REQUIRED. Verbosity is bad, explicitness is good. Alex quoted “the other fellow” as saying (emphasis added):
And verbosity makes code easier to read. I donít have to know anything to understand the code. Everything I need to know is there in front of me.
That’s just plain wrong. Especially if he meant to say “verbosity”, but I think he was confused and actually meant to say “explicitness”. Verbosity does not guarantee that “everything I need to know is right there”. Instead, it usually means that there is more there than I need, which makes it harder to find what I want. Explicitness, by definition, means that everything is there: “leaving nothing merely implied”.

Mythical RDBMS Features

Alex Bunardzic wrote an article on database integrity in which he completely dismissed commonly accepted methods of using an RDBMS to enforce data integrity. I strongly disagree with his conclusions. In fact, I think his arguments are terrible.

The argument against surrogate keys (for example) is based on a naive and misguided understanding of how surrogate keys and unique constraints should be used to enforce data integrity. In the worst case, a surrogate key by itself allows multiple rows to contain the same data (duplication). However, these duplicate records can be merged into a single record if necessary, thus providing a feasible cleanup route. The implied alternative (data integrity enforced by other application layers) only leads to worse problems–at best reinvention of the wheel.

The database is the last line of defense against data corruption. Well written applications will handle integrity violations in a graceful way that shields the user from nasty database exceptions. However, even if the application framework/code (i.e. Rails) doesn’t catch it, at least most major data integrity errors will be caught if database constraints are used.

However, I do agree that many RDBMS engines today (Postgres, MySQL, MSSQL) are not good enough to justify storing large amounts of code in them. I have some serious issues with code stored in the database (i.e. stored procedures and functions). The following list of those issues with implied feature ideas for a better RDBMS:

  • Lack of expressiveness I hate T-SQL! (aside: Postgres does allow the use of Python to write functions, although I haven’t tried it)
  • Lack of version control There’s no way to know if an entity has been modified, and no way to roll back if it gets messed up
  • Lack of good editors Sure anyone can copy the code into his favorite editor and then copy it back when finished, but then there’s the whole problem of maintaining a local copy in case the system crashes while editing. I’d like to be able to check out a local copy, make my changes (clicking the “Save” button as often as I want) with whatever editor I choose, and then commit them back–this ties into the previous version control issue.

These issues have caused me to adopt a hybrid approach to using the database to enforce constraints. I maintain my DDL in a file in version control–adding referential integrity and unique constraints as needed (after all, that’s what an RDBMS is made for). I tend to use surrogate keys in most cases because ORM frameworks (Hibernate and SQLObject) like and/or require them in most cases. When updates are needed on a live database (where the DDL cannot be applied directly), I write update scripts and commit them to version control as well. Finally, I avoid stored procedures and functions except where necessary for security or performance optimizations. This minimizes the number of times that the database needs to be updated, and consequently provides a simpler upgrade path when releasing new versions.

TurboGears - First Impression

The good

To be fair, I really only read the 20 Minute Wiki tutorial and Getting Started with TurboGears

TurboGears has a great set of components. This framework has promise. I’ve used SQLObject and FormEncode–both very nice. CherryPy is my favorite web sub-framework. I say sub-framework because IMHO it provides a good base for a web framework, but not much more. I’ve also looked at Kid and think it has a lot of potential.

I really like how easy it is to install. This should help dramatically with acceptance. I also look forward to keeping it up-to-date with easy_install -U.

Not only is it easy to install, but it’s very simple to get a base project up and running. I’ve heard about how Rails does this, but never experienced it in real life (that’s a different subject). Anyway, back to the quickstart feature. I can see this being very helpful, especially in creating a common structure for each project. Consistency enhances maintainability.

The documentation looks excellent. The writers have gone above and beyond the call of duty by not only documenting the new API’s introduced by TurboGears, but also by providing enough on each sub-component to get those unfamiliar with the sub-component(s) up to speed without leaving the TurboGears website. This is especially helpful in gaining an understanding of how each component fits into the overall framework.

Finally, it’s really simple to learn, and there’s no substitute for hitting the ground running!

The mediocre

Redirects are used heavily for control flow. I’m not very fond of this approach. I’m coming from Springframework where redirects were discouraged except when absolutely necessary. The reasoning behind this is that it’s better to simply render a different template than to invoke a roundtrip to the client and back just to get a different view. After all, the job of the controller is to choose the correct model and view for a given request. In fairness, this is only a development style, not a requirement of TG.

I’m not in love with the turbogears.flash mechanism of displaying a message on a redirected page–using a cookie. I’m not saying it’s bad, I’m just not sure it’s good to rely on something that some people have disabled to transmit messages within your site.

It doesn’t support Python 2.3 (yet), which means OS X users can’t use it without upgrading to Python 2.4 (come on Apple, get with the program).

The bad

UPDATE: The following no longer applies. A fix was committed within 12 hours of when this review was originally posted - greate work guys.

Automatic JSON output via /page/path?tg_format=json looks cool, but it could be a security vulnerability. Sometimes I send objects of which I don’t intend the user to see every detail to my page templates. With TurboGears, the user only has to add ?tg_format=json to the end of any TG controlled URL to get the objects returned by the controller method. The JSON output should be disabled by default. It could then be selectively enabled by some declarative procedure such as @turbogears.expose(json=True). I would expect most sites to expose only a subset of their URL’s as an AJAX API.

The ugly

raise cherrypy.HTTPRedirect(turbogears.url("/redirect/here"))

<rant>
Exceptions are for exceptional situations and should not be used for application control flow

A redirect is NOT an exceptional situation. Therefore: thou shalt not raise an exception to redirect to a new page. A redirect is a tool to prevent the user from refreshing a submitted page (and thereby resubmitting the data, i.e. after a save operation) or to change the URL in the client browser window. It should be used sparingly, but it does NOT have “exceptional” status.
</rant>

A better way to handle this would be to place a turbogears.url object in the dictionary returned by the controller like this:
return dict(tg_redirect=turbogears.url("/page"))

Questions and Answers

These are questions I thought of while reading the 20 Minute Tutorial. By the time I was half-way through the Getting Started page they were answered.

Conclusion

Out of the box, TurboGears provides everything needed to create a complex web application. One simple easy_install command and you’ve got it all. With a little more effort I think TurboGears could be one of the best web frameworks available for Python.

Maine Vacation

Natalie and I went on vacation in Maine earlier this month (Oct. 1-9). Armed with a new digital camera, I took what I thought was a LOT of pictures–about 700. That is, until I talked to my friend Jerry who took about three times as many on his vacation of about the same length. Anyway, without further ado go and see.

Safety Language?

“The popular safety languages are C++, Java, C#, VB and Delphi.” - Kevin Barnes I disagree with one part of that: VB is not a safety language. Rather, I would say “Visual Basic is a danger language.” It has the verbosity and cumbersomeness of a safety language, but does not give the one important benefit of a safety language: safety. It’s just plain dangerous. Hello Variant–can we say “not type-safe?” By definition, a safety language should enforce type safety, which VB does not do. To be fair, I have not used VB.Net, but who would when they can use C# (a true safety language)? Personally, I prefer freedom.

Microsoft fails UI 101 … again

Go read this: Award for the Silliest User Interface: Windows Search (it’s short if you don’t read the comments). Now that you’re back. You read some of the comments didn’t you? Did you see Moooooooogle? Go back and read some more (after you’re done here). Moving on… I actually remember subconsciously being very frustrated with this when I first tried to use the search functionality in WinXP. What makes it even more wrong is the obvious “search everything” (that is “All files and folders”) is right in the middle–you’ve actually got to slow down and read all the options to find it. You’d think they’d have learned their lesson with “How should we index your help file” … I don’t freaking care, I just want HELP!!! Not like you’d find anything useful there anyway with suggestions like “is your computer plugged in?” YES, OTHERWISE I WOULDN’T BE READING THIS…!