Tuesday, March 12, 2013

MySQL vs NoSQL vs Postgres vs SQL Server, Cage Match

Tonight I went to compare the usual suspects for a set of slides I was doing and thought I would share the results.  

I do not believe anyone should be surprised by the results. MySQL became less interesting once it was acquired, and the number of Postgres deployments is greater than what all of the NoSQL market combined adds up to. When looking at emerging technologies, press time given to a single software product has no relation to what its actual market share is (which isn't as obvious as one might think to everyone).

When I get asked what my current opinion about NoSQL vs SQL is, I say that I continue to see NoSQL as a spotlight highlighting the failures in the design of the current relational database software vendors(*). Which should not damn the concept of the relational database, but somehow it happens anyways.

The two biggest issues that stand out to me are that current leading relational databases never solved scale out very well, and online operations are too expensive.

Would a document model really matter if schemas could be altered online? You would still have the weak vs strongly typed arguments that can exist in schema/table design, but if the operation to add additional characteristics to a table (object) was online or automatic I believe a lot of the arguments you see for schema less design would go away.

The other giant failure was that the vendors didn't approach scale out as a strong requirement for future designs. Going back a few years you can see where the vendors would talk about shared nothing designs, but only in the scope of support handful of nodes in a cluster. A hundred or so at best, which is small potatoes for where some of the NoSQL vendors who focus on scale out shoot for.

Its funny to think about how much has been written on this topic that has focused on the SQL language, when no new language has emerged from the NoSQL market that has any size-able adoption. At this point most of the NoSQL vendors now talk about how they have, or are, adding a SQL interface to their products.

The innovation hasn't been in the language, but in the design of database engines themselves.

The other bit of humor in this, is that if you take apart how most of the major vendors have implemented their storage engines, you would find that in the inner design few are strongly typed, and few track relationships in the core of their design. Most of the major vendors could do more operational changes online if they were willing to put reasonable scoping on what can be altered. The Innodb row format is inflexible, but the addition of some version information and a catalog documenting the versions would fix that.

To sum this up, the need to preserve backwards compatibility is what has kept the relational database vendors from solving the needs that caused the NoSQL solutions to come to market. I don't see any of them tackling this anytime soon; the margins on their current products doesn't dictate the need for them to do this.

A final thought about the state of the current nosql databases.

One of the "hard things" that almost(?) none of the current NoSQL vendors has tackled is the JOIN, i.e. the fundamental feature which is required to do anything beyond simple key/pair or basic search. Join optimizers are hard to write(*).  Network aware join optimizers are particularly difficult to write mainly because direct attached storage, AKA a hard drive, is a lot more reliable on a second to second basis than the network.

*) I also get asked which one I would pick if I had to pick one. Currently I suggest Cassandra, though more often then not a table with a key and a blob would work as well.

*) One of the nice things about MySQL's design was that someone designing a new engine could focus on the datastore and punt on the rest (well, sort of,... the pluggable storage engine interface never matured to the point of really playing nicely with other people's designs).