Thursday, March 17, 2011

Drizzle goes GA, From "What If", to "What has"

Not quite three years ago I wrote an article called “What If?”.

What I wanted to do was go back and rethink decisions we had made during the years, especially decisions that we made for MySQL 5.0.

5.0 exists because of the MySQL/SAP alliance. SAP wanted to replace Oracle with MySQL, and to do that MySQL was going to need to run SAP R3 in order to do it. We didn’t just pay lip service to SAP, there was an effort to make this happen. Somewhere in the middle of all this there was also a very odd “we were going to adopt SAPDB as the next MySQL”. Which of course was never going to happen. There were countless meetings over this, and attempts to somehow sprinkle even an ounce of the SAPDB code into MySQL, but that never happened.

As far as making R3 work on MySQL? That was incredibly unlikely, and it was damaging to the product in the end. We ended up with a lot of features that the database was never designed to have. We created an unrealistic set of expectations. We had a source base which had too little testing.

So part of the goal with Drizzle was to cut it back to the core and build modules that we could then create better testing for. So for that reason Stored Procedures, Views, and Triggers were out. None of them were well designed, and all of them had/have major bugs. We tossed out the monolithic kernel design and moved to microkernel design.

MySQL 5.1 made an attempt to patch the replication system that had been written a decade ago. MySQL replication works, but it works with a lot of exceptions. Anyone who has ever put it into production is aware of these. The good thing about MySQL replication is that it mostly works out of the box, and that is something that was a bit of a revolution when it was created. Today? With the notable exception of SQL Server, the rest of the major databases still have replication systems which are difficult to use, install, or deploy.

We initially looked at using 5.1’s replication. We were only going to refactor it such that we were going to beef up its file format and switch to just using the row based replication that was added in 5.1. We were unsuccessful in refactoring it. About 9 months in we figured this out, and we began a rewrite.

The rewrite was the right answer. The original code had too little testing for us to ever know whether or not a change we made created bugs or not.

A big lesson learned, if you are going to refactor code, make sure you have plenty of testing up front.

Internally we have “new code” and “old code”. If we want to make a change to “new code” we can typically do it very rapidly. The rate at which we can extend it is pretty amazing. The MySQL code base is not friendly to anyone who knows C++. Pretty much all of the warnings have been disabled and there are a lot of tricky bits.

We have fixed all the warnings in Drizzle. This is something that isn’t sexy work, and the only way it is justified is because cleaning up warnings fixes bugs. If you are starting a new code base let me implore upon on you the necessity of doing this from the beginning.

Today our replication is pretty spiffy, and it answers a couple of the big “What If” statements I have wanted answered:

1) We use an entirely open message format.
2) We store our replication records directly in Innodb.

The open message format comes with a penalty, it is more verbose than a native format. It takes up more space than if we just shipped the block records created in the transactional engine. Running a point in time recovery on block records is tricky and very limiting. You can’t take the data from one database and push it to another. ETL? Forget it.

We used Google’s Protobuffer for the message format. There are other libraries available but they were either license incompatible or were not widely known. At the time we hadn’t made a decision to go with boost so using its serialization library wasn’t an option. The disadvantage has been that the Google library created a dependency for installing Drizzle. Dependencies are a pain, and when we started Drizzle I had thought that the different Linux distributions had a good handle on this, I don’t really believe this any longer. Avoid dependencies.

Storing the records in Innodb has always seemed to be an easy win in my mind. It solves a lot of the two phase commit problems that plague users and it gives you instant recovery. Storing the log in a separate file can possibly give you a win in that you can do some tricks with IO, but in the end it just complicates everything.

With MySQL you always need to keep in mind the question of “What would MyISAM do?”

MyISAM’s design, and limitations, are scattered throughout the program. In all cases MySQL has to ask “how will this be handled if we need to store data in a storage engine that can’t handle failure, handles all of its own IO, and needs to be locked at the Table level?”.

We dropped MyISAM support about a year into our work, and relegated it to a support only roll for temporary tables. We didn’t hide it completely before we GA’ed Drizzle, but we won’t support it long term. I’ve heard users say “but I want its performance!”. Trading performance for reliability works out for some people, but certainly not everyone. What I find is that when someone wants this, what they really want is a different sort of database all together. Typically it is some sort of analytics problem which creates this need.

Which gets us to the storage engine interface. It was within MySQL the first attempt to create an interface that we could plug different solutions in. I had proposed it in MySQL because I had written different engines and knew what a nightmare it was to make it work.

That engine interface has generated millions of dollars. When I wanted to make it available at MySQL the backlash was significant. Some of sales freaked out, some of marketing thought we were going to let others take over the product, and alliances wanted to know how we were going to limit it to “select partners”. On top of that, half of engineering wanted to go and re-engingeer it immediately.

In Drizzle we have spent a significant amount of time reworking the interface, but it is far from perfect. We redesigned it so that engines now own their own meta data and federate that data to the microkernel. We also designed the interface to require that all new engines have ACID like qualities, know how to handle their own recovery, and can handle failure gracefully. Our core engine is Innodb. We have had others propose new engines, and we have even supported other engines, but at the end of the day we know people want a transactional engine mainly because they don’t want to find that their data has been trashed.

Our Innodb is a little different. We have more views into the state of the engine, and we fixed our version to compile with a C++ compiler. We cleaned up warnings and fixed the bugs that popped up from that. We have begun to refactor it so that it is more integrated with Drizzle’s thread scheduler.

Innodb would have been the default engine for MySQL long ago if not for some “not invented here” mentality, mixed with a flopped buy out attempt. Heikki, the inventor of Innodb, came out quite well in all of this. Good for him.

I don’t believe we will spend much more energy on the storage engine interface going forward. It is a dead business, and while there are a couple of companies that have built brand and product enough to make a go of the business I don’t expect any additional ones will show up. The storage engine business made money for MySQL, but it was a big distraction. While with Drizzle is easier to integrate an engine, I’m not sure that a business exists for storage engine vendors with it. I’ll write more about this at a later date.

Speaking of dates, Drizzle’s internal format for timestamps is 64bit. There is still some work to be done to allow to use all 64bits, but you won’t need to recompile or change your disk format for them. Right now we need to fix some tests, and make sure a couple of functions will handle the formatting, but we store your data such that going forward, or backwards, you are in good shape. Unlike MySQL we store time as GMT, so there is no screwing around with “well I stored my data in my local time zone, but we had the machine set too…”. I have personally spent over a month of time just fixing bugs in that code.

We have spent a lot of time fixing bugs. We get a big collective smile on our faces when we read about new bugs that have been discovered in MySQL, when we discover that we don’t have the bug. We have spent a lot of time fixing bugs, and a considerable amount of our time has been spent on finding new ways to test Drizzle. I am sure plenty of bugs exist to be found.

We also support storing/comparing/displaying time with microseconds. We also have a real BOOL type, which I have been told is handy for the SQL Alchemy folks, and a native ISO UUID type. The UUID is interesting in that it stores time as well as being unique. It isn’t as fast as “please give me the next number”, but I believe it will be useful for a lot of applications. We have refactored all of the types, and the only one that was not size related that we dropped was SET. If you wonder why we dropped it read the section in the MySQL manual about its limitations and bugs.

Why do we allow only DELETE against a single table at a time, like pretty much every other RDBMS? Beyond the conceptual issue that few can wrap their heads around how to form, let alone feel like they know what the query will do, we hit the problem of the “multiple execution path”. There were a lot of one off execution paths in MySQL. In a lot of cases I know these were dead refactoring projects that were never completed. The “multiple execution path” problem is particularly disturbing when you think about fixing a bug. If you fix a bug in DELETE you need to know that there is an execution path for a single table that is different then the path used for multiple tables. This leads to odd behaviors, and a much richer set of bugs that exist.

SQL modes? Those are gone. If you wonder about what sorts of problems they create inside of the server, I’d suggest reading about the “Legend of the Ambalappuzha Paal Paayasam”.

In general in Drizzle we have tried to get rid of Gotchas that we have found. Things like declaring a column NOT NULL and discovering that somehow the database still stored a NULL is gone. Altering a table and adding a field that would violate the structure of the table? That is gone.

It is amazing that ALTER TABLE works, as the code there is Byzantine. We have made some effort to clean it up, but it is still way too tricky. I wish we could have done more there, but it is what it is. Are you using partitions? Make sure you back up your data before doing an ALTER TABLE. Wrapping partitions into the system in the way that was done at the time was simple, but it is far from robust.

I had hoped that with 5.1 we would have created a single logging API, but instead we ended up with multiple logging API internally in the server. With Drizzle we ripped all of them out and installed a single API. It is crazy simple to write a new logging plugin.

Which gets into the philosophy of plugins in general. Writing plugins should be low hanging fruit. Whenever possible we have tried to make it the case.

We have an entirely new INFORMATION_SCHEMA in Drizzle. It is based in table functions, which is a new concept in Drizzle. We keep a separate schema called DATA_DICTIONARY, in it we put whatever we like. Our INFORMATION_SCHEMA is only what the SQL standard has specified. We do zero vendor modification to it. Another hats off to SQL Server, their INFORMATION_SCHEMA is the closest to complying with the standard.

Drizzle’s drivers are BSD. They were written outside of Sun, and Sun signed off on contributions to them under a BSD license. They speak Drizzle and MySQL’s protocol. There is a JDBC version that was written. Their adoption is becoming wide spread. Licensing clarity around them is a big win for us, and for ISV who want to integrate. MySQL’s licensing mess was related to a lot of hand waving involving them. Recently I noticed that MongoDB had written up a clear licensing policy with regards to their own drivers. Awesome.

We never got to finish all, or really much of any, of what we wanted to do with the Drizzle protocol. I believe this is an area where we will see change in the near future. Internally inside of drizzle we have a C++ interface that resembles JDBC that lets us execute queries. We will be doing a lot more with that interface going forward.

What about performance? With Drizzle we began doing benchmarks early on, using a few different benchmarks. The benchmark generated by sysbench has always been the benchmark we have used as our bellwether. Unlike a lot of databases we test Drizzle with up to 1024 concurrent executing queries. Most of the benchmarks I see people run are for far less connections. We have chosen time and time again to favor performance gains at the high end, over gains on the low end. We are roughly double in performance from where we began. We could still do a lot better. MySQL 5.5 has a new meta data locking system which should do well in a number of situations, we could do a bit better in some of these cases. Our lack of a MyISAM would make it simple for us to move forward in this direction if we want too.

There has been a lot asked about our claim on scaling with lots of cores. Our process there is simple, eliminate locks, favor performance gains when we find them that favor of additional CPU, and try whenever possible to remove strong ownership that require waits for locks. MySQL relies on MyISAM, and MyISAM has significant locks, especially around the keycache, we got rid of those by freeing ourselves from MyISAM. We have had some gains with our new scheduler and we have done some to improve how IO is handled.

I am sure we have a lot of tuning still to do. We won’t be publishing benchmarks which compare us to others though. I’ve yet to see a comparison benchmark which wasn’t completely flawed, and even when they are not, few people really understand them. They fall into the classic “how many angles does it take to dance on the head of pin” conversations.

Our authentication system is modular, and we need to iron out more of the authorization system.

I’ve seen someone say that Drizzle is designed for Google and Facebook. This is not the case at all. We built it so that the next Facebook, Google, etc would have a platform to build on. Facebook and Google have their own forks of MySQL, they aren’t going to be using Drizzle. The pieces are there for the next company who needs to innovate, it is just a matter of someone making use of them. We speak the MySQL protocol, so the typical MySQL application runs just fine on Drizzle without change. We designed Drizzle to work as a piece of someone’s current infrastructure, not be yet another application which has a costly integration. We have a NoSQL sort of solution via the blob streaming module, but we are first and foremost a relational database.

What will the next Google or Facebook find? A much more friendly platform than what MySQL provided to develop on and with. The big success for Drizzle has been in the people that have been involved. We are without a doubt the descendant of MySQL that has the largest contributor base, and we have long passed MySQL with regards to contributors. We are well into the hundreds when it comes to developers who have contributed code. We have had more then 921 commits in the last month across 20 people. Our numbers go up and down, but we are consistently more then double anyone else in size. If you just walked out of college, or skipped it all together, you are going to have a much easier time adjusting Drizzle to your needs . At least we believe this :)

The codebase is C++, we make use of Boost, and while we are cautious, we tend to favor more forward thought in how we code. Readability is the key to creating code that others will use. Because in the end? We can scale silicon, but carbon? People are much harder to scale.

The people to thank for the code:
Brian Aker
Monty Taylor
Stewart Smith
Lee Bieber
Jay Pipes
Padraig O’Sullivan
Andrew Hutchings
Marko Mäkelä
Joe Daly
Olaf van der Spek
Vijay Samuel
Patrick Crews
Toru Maesaka
David Shrewsbury
Eric Day
Zimin
Marisa Plumb
Joseph Daly
Barry Leslie
Asil Dimov
Mark Atwood
Tim Penhey
Jimmy Yang
Paul McCullagh
Nathan Williams
Paweł Blokus
Sunny Bains
Andy Lester
Hartmut Holzgraefe
Trond Norbye

Other people to thank?

David Douglas who at Sun supported us initially, and when we didn’t think our internal support initially at Sun could get any better? Bob Brewen worked with us till the end came for Sun. An extra mention should be made for Lee Bieber, he has been working with the project from nearly the beginning as well. He has handled project management, done code refactoring, made flyers, organized dinners, and did everything else in between.

Mike Shadle for getting us machines, and making sure everything runs. Adrian Otto at Rackspace should be thanked (along with a number of other people as well).

A thank you should go to Chris Dibona for the Google Summer of Code project. We have a number of students who now work on databases for a living thanks to that program. While with MySQL we constantly failed at getting student’s code into the server, with Drizzle we have had a lot of success.

There is an entire channel of people who have been involved with Drizzle on Freenode in #drizzle who should be thanked as well. IRC is how we communicate.

There are a lot of other people I am forgetting to thank, sorry about that.

So what next? There is a lot more to Drizzle then what I have written above. Having worked on this for years I often forget what the differences are anymore. There are lots of new features, plenty of new enhancements, and new bugs just waiting to be found. I’m giving a talk at Web 2.0 Expo in a couple of weeks in San Francisco were I will talk about some of what we have done and are doing for virtualization.

I will being giving a keynote at the O’Reilly MySQL Conference & Expo, and there are a handful of talks there on Drizzle as well. The MySQL Ecosystem is a radically different place then what it was a year ago, I’ll be commenting on it in the future online and at the conference.

About a week ago Monty Taylor and I sat down and talked about what we wanted to do with Drizzle going forward. Monty has been working on this since the beginning with me, and he has been a lot of fun to work with. One conclusion that we both came to was that we want to see where people will take Drizzle before we determine too much about its future. It is easy to get caught up in new features, and we are interested in seeing how others use it before too many decisions are made about what to do next.


No comments:

Post a Comment