So, today I spent roughly two hours hacking the MySQL driver for Squeak to add the DESCRIBE command and adding MySQL support to ROE. ROE is an awesome package that lets you write queries directly in relational algebra in your Smalltalk code. It generates all of the SQL to actually execute the query and delays execution until as late as possible. The effect is very nice. I’m currently working on a tutorial for how to use ROE in Seaside, and as part of that, I thought it’d be nice to add MySQL support. Until MySQL 4.1, that would have been impossible, as ROE relies heavily on subselects to implement its logic, but since MySQL 4.1 now supports them, I thought I’d give it a shot.

Now, the first of those two tasks, once I had read through the code for the MySQL driver and gotten to understand it decently, only took about 60 seconds. DESCRIBE ends up just being a normal query, so all I had to do was modify the driver to treat DESCRIBE like SELECT. But the second one took a long time, and ultimately underscored to me yet again both why MySQL is simply a Bad Idea and why a little bit of up-front research can save you hours of frustration later on.

The first problems were “tiny” annoyances. MySQL 4.1 changed the authentication algorithm from the one used in 4.0, which wouldn’t be a big problem except that the new communications protocol isn’t publicly documented anywhere. (If you Google, some people have published descriptions of it from analyzing the source code, but the official documentation has not yet been updated to provide a good description.) There’s a toggle to downgrade MySQL’s authentication to 4.0, but it’s not retroactive, so you have to re-encode the passwords. It’d have been nice if they had, you know, documented that somewhere, but whatever. Once I got that sorted out, I added some tables, filled them with dummy data, and started writing an adaptor for ROE.

Now, in my book, when it comes to breaking standards, there are a lot of different ways to do it. There’s breaking the standard by adding features in odd ways that no one else supports, but everyone needs (NVARCHAR and friends in SQL Server, and arguably AUTO_INCREMENT in MySQL); there’s breaking the standard by not implementing features everyone else has (sequences, views, triggers, etc. in MySQL); and then there’s just totally going off and doing your own thing even though the standard already solved the problem you’re having. I refer here to the way that MySQL treats quotes. In every other database I’ve ever used, single ticks (') mark strings, double ticks (") mark identifiers with otherwise illegal characters, and back-ticks (`) are illegal. In MySQL, single ticks mark strings, double ticks also mark strings, and back-ticks mark identifiers. Why’s that matter? Because it means that I had to add in additional abstraction functionality for identifier quoting in ROE just to support MySQL, which refused to accept queries such as SELECT "id", "title", "body" FROM posts that execute fine in at least PostgreSQL 7 and 8, DB2, and SQLite 2 and 3. That’s impressive, fellas. What exactly was wrong with SQL92 there? Too plebeian?

Now, ROE depends heavily on subselects to write its queries. PostgreSQL optimizes them fine. Even SQLite optimizes them fine (although it took Avi Bryant and me awhile to chew through its rather cryptic EXPLAIN output and verify that).

MySQL generates temporary tables for every subselect.

Technically, I suppose that that does qualify as having subselect support, but just…wow.

So basically I wasted several hours of my day today making ROE run on a database that can’t support it decently. I’m keeping that particular Squeak image around in case MySQL 5 has better optimization, but the lesson here is that you should always check beforehand whether what you’re doing makes sense.

That, and stay away from MySQL.