Tuesday, November 1, 2011
Tuesday, October 18, 2011
Contributor Agreements, Open Stack's Contributor
I’ve never felt like this is a very honest exchange. With MySQL you were handing over copyright to a company that was making money off your work. With the FSF I have always been bothered by Richard’s insistence that FSF has the right then to take that code and relicense it. His stance on dual licensing under commercial licenses is my issue with this.
The Open Stack contributor’s agreement is a bit different then much of what we have seen thus far. It basically states, and please keep in your mind firmly that I am not a lawyer, that you have the right to submit the code you are submitting. It states directly that you can do whatever you want in the future with the code you wrote.
It is not Apache specific. I don’t see any reason why it couldn’t be used with a GPL project as well.
It is worded such that the group that the code is contributed too couldn’t just take your code, and then hold the contribution hostage. By hostage I mean that the contribution would sit in a limbo where you couldn’t do anything with it, and would therefor be at the mercy of the new owner of the code publishing it.
The GPL, BSD, Apache, MIT, and other licenses where a watershed in their time. Lawyers learned the licenses, and each of these licenses have been “debugged”.
Every time a company sees a new license, or a new legal agreement, there is a huge bar that must be met before it can be signed.
If you are an engineer, think of lawyers as a picky c++ compiler. Some lawyers issue better warnings then others. Some organizations turn the compiler flag “all warnings to error” on. Other organizatiosn not only do this, but add they add in -Wextra for good measure.
We have software licenses, the OSI stamped out quite a few of these.
What is missing then?
- Agreements for developers who are on advisory boards.
- Contractor agreements that carve out open source projects such that they don’t become entangled with “work for hire”.
- Contribution agreements.
Where is the advantage in using it? We already have a long list of companies who signed it for Open Stack. It has been debugged, and a number of large companies are willing to sign it.
Friday, September 30, 2011
Thursday, September 29, 2011
Spoon full of Sugar, Oracle and the Open Core Model
From the 451 Group:
“MySQL flirted with the open core licensing model in early 2008 with plans to introduce new features into Enterprise Edition that would not be available under an open source license.”
MySQL didn’t flirt with, it was going to do it.
Why? Because we were asking the question, “how do we pull in customers to make more money”.
MySQL was going to put the new backup API, which never materialized, into an Enterprise branch.
It was a lousy idea for the following reasons:
1) There was no internal API in the server for this, so the engineering was going to be messy and expensive.
2) We didn’t own the technology that was needed to even do this (Oracle owned Hot Backup)
3) Percona has an awesome tool for doing this, that is Open Source (http://www.percona.com/software/percona-xtrabackup/)
4) Backup is a core feature everyone needs, and some of those “everyones” are the folks who manufacture tools that you want to have work with your product.
5) When we were going to announce it, we hadn’t even written it/completed it. It was vaporware.
It would have been a horrible move, and would have caused Chaos for no particular reason. It was dead on arrival, and when it was to be announced as a strategy since it didn’t even exist.
Lets look at Oracle’s move. Both the authentication module, and the Thread Pool come into the MySQL server as plugins. If the engineering of the MySQL server continues in the current direction (which is somewhat flattering to Drizzle I might add), then they are on a good path (if I can find my blog entry where I talked about this as a good strategy, I’ll link back to it here).
Much of the hubbub around Open Source, Community, etc, in regards to this are a bit inflated I feel. They haven’t touched the core product, and they are creating API. Are they possibly hurting themselves in regards to ubiquity?
Doubtful.
Would I pick those two pieces? No, but they aren’t the last two I would pick either. If Sun had continued as a company? Something similar to this would have been done as well.
From an engineering and usage stand point?
The first person who sniffs at the authentication mechanism who knows anything about security is going to freak.
The Thread Pool can only be used by a very limited number of users (and there are some restrictions on what can be done in the server while it is in use). MySQL’s IO was never designed for the Thread Pool, and there is a lot of engineering work that would need to be done to make it work.
Still? People will use both, and I am betting some customers will want them badly enough to pay.
If they are really badly needed? Well then someone will write an open source version of both.
I have no great love of Oracle, but this is really not a big deal at all. The original GPL’ing of the Public Domain/LGPL clients was a much bigger deal.
Thursday, September 22, 2011
memcached_exist()
New in version 0.53 (which yes, I really should renumber into 1.X at some point in the near future) is memcached_exist().
Ever wanted to find out if a key existed but didn’t want to have to fetch the object?
Well now you can do this. It works by seeing if an add can be done on the key (the add though is dated in the past, so any write afterward will expire it).
You can currently grab the code via bar on Launchpad.
Have fun!
Monday, September 12, 2011
First commute discovery of the day? My orca card is empty.
Friday, August 5, 2011
Thursday, August 4, 2011
C/C++, #Ifdef, working around the need for them
Consider the all to common code:
#ifdef HAVE_SOMETHING
something();
#endif
ifdef are problematic because any number of combinations of them can lead to multiple execution paths that the compiler never will so. PIA
The above assumes that HAVE_SOMETHING will have some value or not.
Another option is the following:
if (HAVE_SOMETHING)
{
something();
}
The significance in the above?
Any good compiler will evaluate HAVE_SOMETHING, and if the value is zero then the code path will be optimized out.
Awesome.
Thursday, July 21, 2011
Dear Lazyweb, how secure is Tomato?
So today I noticed on one of my internal servers the following:
Jul 17 23:53:13 localhost sshd[31847]: Invalid user sales from 123.196.113.11
Jul 17 23:53:13 localhost sshd[31848]: input_userauth_request: invalid user sales
And I also see….
Jul 17 23:47:11 localhost sshd[31690]: reverse mapping checking getaddrinfo for 42.ac.84ae.static.theplanet.com [174.132.172.66] failed - POSSIBLE BREAK-IN ATTEMPT!
Also?
Jul 20 14:56:01 localhost ¿<28>fail2ban.actions: WARNING [ssh-iptables] Ban 121.88.250.208
Huh? Nothing is port forwarded, and the only thing that could be connecting to the box is a Linksys running 1.28 Tomato.
So I am wondering, is Tomato secure right now?
Thursday, July 14, 2011
MySQL, Enum, skip the if()
There are a number of different, and very valid patterns for handling objects of different types. This is not about that, this is about how to not mix a pattern.
A very, very common bit of code that is in MySQL (and can therefor be found in Drizzle):
if ((cached_result_type == DECIMAL_RESULT) or (cached_result_type == INT_RESULT))
{
do_something();
}
else
{
do_something_else();
}
DECIMAL_RESULT and INT_RESULT are each possible result types.
Are there more?
Why yes there are. In the above bit of code the original author thought about two cases, and assumed all other cases could just be lumped into the else.
I’ve fixed dozens of bugs over the last few years based on similar assumptions.
What assumptions?
1) The no one would ever add another result type.
2) That no other bug fix might create a case where the else no longer held true.
3) That the else was ever correct in the first place.
Without changing the entire design, what would be better?
Use a switch and make a case for each enum. That way if a new enum is added anywhere in the code where logic is required based on the enum you will catch it when you compile (assuming you have your warning flags turned up in your compiler).
Also? Skip “default”. Unless you are taking something off the wire/file/etc you can skip default because you aren’t going to end up with an invalid enum. If you are doing one of these actions?
Sanitize the data first, don’t just cast it.
Thursday, July 7, 2011
Syncing, Google vs OSX, iPhoto Ate my Father's Wedding Pictures
I don’t really love OSX, as much as I happen to be a UNIX bigot.
Minus the long grey beard.
Why do I like OSX?
- Terminal always works.
- WiFi always works.
- I love Toasters
Toasters are awesome. You put bread in, you push the bar down and you get toast.
That is until the toaster starts to burn bread because either the dial has been turned all the way up, or the toaster has become so old that the springs are worn out.
iPhotos? iPhotos burns a lot of bread, I mean, it eats a lot of photos.
It is really irritating to lose photos, especially in the manner that happens with iPhotos. With iPhotos you can see the icon it made of the photo, but the original? It is long gone. I haven’t done an exhaustive search of all of the meta data, but you certainly can’t export or even view the photos.
A lot of my photos I upload to flickr when I want to store/share them longterm, but I haven’t always done that.
And when I went to show someone a photo from my father’s wedding? I discovered that it had once again eaten all of the photos from the wedding.
So what to do about it? I’m going to go with Picasa. I recently acquired a new NAS (I upgraded from my NV+ Readynas, to a Ultra6 ReadyNas). I have been writing scripts that have been extracting all of the pictures from all my computers. It is opening up tarballs of old home directories and pulling images from them and then storing the images to the NAS. If I could figure out how to deal with Spam I would extract all of the images from my email as well.
Picasa has been running for a couple of dates. It has ~15K of headshots it has found. I thought that iPhotos face recognition was pretty gimmicky. The Google one though? It is sharp. It is finding friend’s photos that I didn’t know that I had (so much for anonymity during the Fremont Solstice Parade!). My only real complaint with it so far is that I wish I could share the facial recognition information with friends so that we could collectively parse photos.
Downside? Picasa image display is not that awesome. Its slow, and for some reason someone thought it would be brilliant to include all of the headshots in one window. Which means I have to do a bunch of scrolling to approve photos that it is finding.
Like all programs Picasa needs a kill file.
Another downside to Picasa? It is walled garden. I like flickr, I am going to continue to use flickr. It’s annoying that I can’t sync between the two (maybe Google will buy it?).
At the very least you would think that Gmail would be able to extract photos from email, Or make it easy to share photos between my computers.
iPhoto did an ok job at editing photos, Picasa is really lacking when it comes to this. I’ve been meaning to make more use of Lightroom, I guess this will give me a reason.
Next on my list of problems to solve?
Contacts.
Friday, July 1, 2011
Can you package up that library for us?
One thing that we, Data Differential, have is a lot of code. We get requests, infrequently, about packing up one or more libraries that we use in our products.
Our test harness, libtest (uTest), is one that comes up frequently. We use it for all of our products, and there are a handful of open source projects out there that use it as well.
Why have our own, why not just use GoogleTest, or the one built into Boost?
- We have different features. libtest can start, stop, kill, etc a number of types of servers with different options.
- Integration with tools. If you are in vim and type “:make”, if an assertion in a test occurs, you go directly to the error in vim.
- It is always around, because we ship it in our code.
- It can rerun collections of tests over and over, with different flags/options.
- Extended testing with valgrind.
- It gives us a regression report for performance for each test.
- Does C/C++ libraries (which is our bread and butter).
Does it lack things?
- Documentation. Everyone who uses it today, uses it because they have worked for me at some point (or…). It is the network effect.
- CLI applications. It cannot test these at all.
What is the big win?
We have 50K lines of test cases at the moment, that just we maintain. There is an example for just about anything. The framework has a number of ways it can be extended so it is it is not hard to find an example to show someone (and we receive a couple of test cases a week from users, so we know that the average developer can pick it up rapidly)
So should we package it up?
Probably not.
Is having it be open source a win for us?
Yes.
We can distribute it, we can have our customers distribute it, and we can ship it with each download. We get the benefit of having all of our users install and run it. Because we maintain, developed, and are further developing it, we make testing a core competency.
So why not package up everything we do? There isn’t the time, and there really isn’t a reason.
ABI compliance? We don’t require it for libtest.
More developers? There is this myth in open source where people believe that if you open source something, i.e. throw it over the wall, people will come. That is not the case at all.
Help with Development? Maybe, but it could also just become a time sync for us.
We have gotten similar requests for other libraries, like libhashkit. We haven’t bothered there either (though we do ship it, provide ABI, and install it). In its case we have had other companies fund the work, and they just happen to hit us with the requests when we had some free cycles (which is rare for us at the moment).
So will we package up more libraries?
I think the better question is, do we have a compelling reason to package up a library?
Monday, May 23, 2011
Tuesday, May 10, 2011
Sunday, May 8, 2011
Sunday, May 1, 2011
Thursday, April 21, 2011
Tuesday, April 19, 2011
MySQL, State of the Ecosystem 2011
A number of years ago I coined the term “the mysql ecosystem”. I did it at the time to express a view that MySQL had moved beyond being just what MySQL AB defined “MySQL” as being.
It was a radical thought at the time. In part because when I expressed it, I did it not only outwardly to the world, but inwardly to the company as well. Many at the time thought that the ecosystem danced at the whim of the MySQL AB entity. When Peter Zaitsev left to form Percona I remember very clearly a management meeting where there was a hubris that his business would amount to nothing, and that he was missing his opportunity to be a part of something greater. History is of course writing a very different story.
So how is the ecosystem?
It turns out it is pretty healthy.
I wasn’t sure if that was the case up until a couple of weeks ago. I was having lunch with Moshe Shadmon of ScaleDB and I asked him “Do you think the market is collapsing?”
His response to me was one of enthusiasm. He pointed out to me the obvious indicators. The growth Amazon has created with its relational database service, and the continued growth in applications that support the MySQL interface.
The conversation put me into a really positive mind set about the community. What did I find at the O’Reilly MySQL Conference?
I found a lot of happy people. I saw adoption numbers which show positive growth.
What didn’t I find? The overwhelming negativism of the previous two years that I have sensed in the community was not to be found. It has at times made me question not only my involvement, but the involvement of Drizzle* in the ecosystem. I personally don’t wake up everyday wanting to welcome that into my life.
But what was the vibe of the community this year, and that of the conference?
This year the negative vibe was seen as something that was not only as ugly, but as something that was an aberration. An evolutionary path that the ecosystem seems to not be taking. That is pretty awesome.
What are the big questions facing the Ecosystem?
Oracle. I watch the MySQL trees, and I see that they are having an overall positive influence on the codebase. They are making good decisions, none of which appear to be malicious in nature. I hear from people who are using it, and I get an overall positive view of the work.
The people I ask?
They aren’t the shills that are trying to gain favor with Oracle, these are people who have 24x7 needs who don’t have the time to write blog entries, and who see MySQL has just one piece of their overall architecture.
If you are using MySQL today, and you need a solid path forward on it as a platform?
I’d stick with what Oracle is creating.
Oracle will be Oracle though. They have a giant marketing machine that will not want, and by policy not allow, events to occur which favor a product like MySQL over other products. Oracle Open World will not be a MySQL conference. MySQL will be a track in that conference, a booth at best. Oracle will push for venues that they control. Oracle will push for users to adopt their stack, and MySQL is just another cog in their system. A vector to attack Microsoft? A product to keep at bay the growth of an open source database?
It might be all of that and more, but it will not be a crown jewel. The company is too large to focus its attention on MySQL, and the money that it obtains from MySQL is not enough for it to ever take center stage.
In the end?
The attention span of large companies is quite small, and at some point it will fade.
Will Oracle have a MySQL Sunday again this year at Oracle World? If it does, will it have one the year after?
There is nothing wrong with this, it is just the nature of large companies.
Percona. Percona is impressive. They do excellent work based on an excellent reputation which they have grown by doing the right thing. I’ve been asked before if they will become the next MySQL. I don’t believe they will. Percona looks to be the next Electronic Data Services.
Do they have a server product? Yes. Will Percona Server be the next MySQL server? No. Is that because it is inferior? No. It is because Percona server is about delivering on their ability to be the best at MySQL consulting. It is not going to go away, but I will be surprised if Percona decides that it is their one and only product that they service. Percona Server is an asset for them, but they show no evidence of being singularly focused by their own product.
SkySQL. SkySQL has a great feeling to it. It has the exciting feel that MySQL once had, but I see no signs of the baggage that MySQL AB gained in later years. The people they are hiring are excellent. In the MySQL world they could very easily take the dominate position in the next year.
Monty Program. I don’t feel like I can really say much here, but I don’t want to say anything by leaving it off the list either.
Amazon. They were a sponsor of the conference this year. They are certainly a player in the ecosystem, though for the most part a silent one. From an engineering stand point I believe they have one hell of a challenge. How do they continue to provide MySQL services without a deep technical bench and a roadmap that will allow them to adopt new versions of MySQL? They don’t shape the MySQL universe in the ways that others do. They do not provide code, and they do not influence the direction of the product in any manner that allows them to influence beyond the scope of their own service.
Their service though? Amazon could be setting a stage where we see the MySQL interface solidified. If a large portion of MySQL apps are shaped by the question “will this app work in the Amazon cloud?” then they will have their say.
Are there others? There are plenty of others. Canonical and Redhat will shape the Linux distributions, and that in turn will shape what users first see. There are players like Infobright who will shape the analytical market.
Postgres continues to make progress. When I ask folks who study the market how they see Postgres I never get a response that it is on their radar. But when I ask operation folks? There I hear about its growth. At some point an application is going to come along that will change the view of the market.
The MySQL codebase? It is GPL. Nothing has changed about that, and nothing that we are seeing, or that is talked about in private conversation, leads me to believe that is changing. There was some hubbub at the conference about Oracle removing the FLOSS exception from the codebase. There was talk that this created a situation where at any moment Oracle could change the exception and squeeze someone via a license gotcha.
When it was brought up it made me suspicious as well.
The thing is? Its up on the website still, and the page has been recently updated. It has also been cached and stored by Google. Removing it from the source code doesn’t mean much.
Its good to be suspicious, but I suspect that all the removal was, was a simple mistake made by a blanket policy about communication. Oracle’s open source behavior, its table manners, are haphazard. I don’t believe you can expect anything else.
In the end?
The MySQL Ecosystem is doing just fine. There are challenges, but there has always been challenges.
*Drizzle I leave Drizzle out of the discussion because I both feel like it is inappropriate to mention it because of my own involvement, and because I actively debate our involvement in the MySQL Ecosystem. I’d rather push for our own environment.
Monday, April 18, 2011
A reminder...
The new blog can be found at http://blog.krow.net/
The recent DOS attacks against LJ means that my attempts to keep both in sync is failing, so be sure to update to the new link if you are curious…
Libmemcached 0.49 Released
We have released version 0.49 of libmemcached. This version has part one of two of the new configuration work.
You can find an example of it here: http://docs.libmemcached.org/libmemcached_examples.html
There has been a lot of updates to the documentation, http://docs.libmemcached.org/, and there was a big improvement made to the code and we are finding that we better performance for large objects on Linux.
Configuration now is much simpler, and is no longer different on a per language basis (ie once the downstream drivers pick up this version you can configure any language in the same manner). There is native virtual bucket support, and I have a developer assigned to looking through the changes in 1.6 so that we can support it based on server responses.
From talking to Alan (ie Dormando) he has some ideas on how this will work in the server, and I am excited to see what will happen here.
On the microcontroller side we will be publishing a port of the micro client to the ARM STM32 (which follows on the work we did for the ATmega328).
memslap has been split off into the older, very portable, version, and the less portable memaslap.
There is a new error message system which first goes into this version which will give you a lot better info on what is going wrong if something is going wrong.
The pool interface has been updated as well, which means if you don’t want to run a proxy, you don’t need to. This was an idea Trond had a couple of years ago, and we have updated it so that you ran reduce the complexity of your environments.
Friday, April 15, 2011
Friday, April 8, 2011
Thursday, April 7, 2011
Additions to the Gearman API
There are a number of new things which are coming to the germane API that I thought I would blog about (and before anyone asks, all of the old API is available as well).
Here is an example of usage:
gearman_function_st *function= gearman_function_create(gearman_literal_param(WORKER_FUNCTION_NAME));
Functions are now a type. There is a lot being done to make functions more powerful by having context and meaning within the server. This begins to expose some of that. gearman_function_t itself being a type allows us to add characteristics to the function. Functions can also now be written to use any valid UTF-8 as a name, which was something that was not really possible in the past (it worked in almost all cases).
gearman_workload_t workload= gearman_workload_make(gearman_literal_param(“test load”));
gearman_workload_set_background(&workload, true);
This is an example of generating a workload and will be passed into the server. The workload type is very lightweight, it essentially wraps the characteristics of the work. It does zero memory allocation of its own when creating the type to it is simple to generate them and not be concerned about having to clean them up. Things like background, priority, scheduled time, and a few other characteristics can all be specified. Just like function it is not tied to any single client connection so it can be resent to multiple clients without worrying the lifetime of any single client.
gearman_unique_t unique= gearman_unique_make(gearman_literal_param(“my id”));
Unique keys are also now given their own value. You can continue to have the server generate one as needed, or you can create your own.
gearman_status_t status= gearman_client_execute(client,
function,
&unique,
&workload);
gearman_client_execute() is now the new workhorse of all methods. By having characteristics on functions and workload we remove the need to have dozens of commands.
if (gearman_status_is_successful(status)
{
gearman_task_st *task= gearman_status_task(status);
}
Status types now give you information on what has occurred with a job. If appropriate, you can also have access to the task that was generated from the execution of a job. All of this is now available in the build trees for Gearman. Have fun with it!
Friday, April 1, 2011
libmemcached configure language
In the next version we are rolling out the new configuration language that we developed a while ago.
Configuration is hard, and trying to determine all of the little options can be time consuming. A while ago for a couple of customers we wrote a configuration language and a couple of other utilities to simplify the process.
In the next version of libmemcached we will be rolling out the API for it. Here is an example:
memcached_parse_configuration(memc, "--DISTRIBUTION=consistent,MD5 --servers=localhost:11221,localhost:11222,localhost:11223,localhost:11224,localhost:11225 --CONNECT_TIMEOUT=456 --NUMBER_OF_REPLICAS=2");
It can either parse via a string you pass to it, or it can read everything from a file.
There are a number of options to the language like “RESET” which will take a configuration and reset it at the point that the keyword is found. END which will allow you to stop parsing, INCLUDE which will let you include files that will be parsed as well.
There are a number of other bits to the language as well.
The code is now in lp:libmemcached, I am hoping to publish a new version sometime in the next few days.
Have fun with it!
Thursday, March 17, 2011
Drizzle goes GA, From "What If", to "What has"
Not quite three years ago I wrote an article called “What If?”.
What I wanted to do was go back and rethink decisions we had made during the years, especially decisions that we made for MySQL 5.0.
5.0 exists because of the MySQL/SAP alliance. SAP wanted to replace Oracle with MySQL, and to do that MySQL was going to need to run SAP R3 in order to do it. We didn’t just pay lip service to SAP, there was an effort to make this happen. Somewhere in the middle of all this there was also a very odd “we were going to adopt SAPDB as the next MySQL”. Which of course was never going to happen. There were countless meetings over this, and attempts to somehow sprinkle even an ounce of the SAPDB code into MySQL, but that never happened.
As far as making R3 work on MySQL? That was incredibly unlikely, and it was damaging to the product in the end. We ended up with a lot of features that the database was never designed to have. We created an unrealistic set of expectations. We had a source base which had too little testing.
So part of the goal with Drizzle was to cut it back to the core and build modules that we could then create better testing for. So for that reason Stored Procedures, Views, and Triggers were out. None of them were well designed, and all of them had/have major bugs. We tossed out the monolithic kernel design and moved to microkernel design.
MySQL 5.1 made an attempt to patch the replication system that had been written a decade ago. MySQL replication works, but it works with a lot of exceptions. Anyone who has ever put it into production is aware of these. The good thing about MySQL replication is that it mostly works out of the box, and that is something that was a bit of a revolution when it was created. Today? With the notable exception of SQL Server, the rest of the major databases still have replication systems which are difficult to use, install, or deploy.
We initially looked at using 5.1’s replication. We were only going to refactor it such that we were going to beef up its file format and switch to just using the row based replication that was added in 5.1. We were unsuccessful in refactoring it. About 9 months in we figured this out, and we began a rewrite.
The rewrite was the right answer. The original code had too little testing for us to ever know whether or not a change we made created bugs or not.
A big lesson learned, if you are going to refactor code, make sure you have plenty of testing up front.
Internally we have “new code” and “old code”. If we want to make a change to “new code” we can typically do it very rapidly. The rate at which we can extend it is pretty amazing. The MySQL code base is not friendly to anyone who knows C++. Pretty much all of the warnings have been disabled and there are a lot of tricky bits.
We have fixed all the warnings in Drizzle. This is something that isn’t sexy work, and the only way it is justified is because cleaning up warnings fixes bugs. If you are starting a new code base let me implore upon on you the necessity of doing this from the beginning.
Today our replication is pretty spiffy, and it answers a couple of the big “What If” statements I have wanted answered:
1) We use an entirely open message format.
2) We store our replication records directly in Innodb.
The open message format comes with a penalty, it is more verbose than a native format. It takes up more space than if we just shipped the block records created in the transactional engine. Running a point in time recovery on block records is tricky and very limiting. You can’t take the data from one database and push it to another. ETL? Forget it.
We used Google’s Protobuffer for the message format. There are other libraries available but they were either license incompatible or were not widely known. At the time we hadn’t made a decision to go with boost so using its serialization library wasn’t an option. The disadvantage has been that the Google library created a dependency for installing Drizzle. Dependencies are a pain, and when we started Drizzle I had thought that the different Linux distributions had a good handle on this, I don’t really believe this any longer. Avoid dependencies.
Storing the records in Innodb has always seemed to be an easy win in my mind. It solves a lot of the two phase commit problems that plague users and it gives you instant recovery. Storing the log in a separate file can possibly give you a win in that you can do some tricks with IO, but in the end it just complicates everything.
With MySQL you always need to keep in mind the question of “What would MyISAM do?”
MyISAM’s design, and limitations, are scattered throughout the program. In all cases MySQL has to ask “how will this be handled if we need to store data in a storage engine that can’t handle failure, handles all of its own IO, and needs to be locked at the Table level?”.
We dropped MyISAM support about a year into our work, and relegated it to a support only roll for temporary tables. We didn’t hide it completely before we GA’ed Drizzle, but we won’t support it long term. I’ve heard users say “but I want its performance!”. Trading performance for reliability works out for some people, but certainly not everyone. What I find is that when someone wants this, what they really want is a different sort of database all together. Typically it is some sort of analytics problem which creates this need.
Which gets us to the storage engine interface. It was within MySQL the first attempt to create an interface that we could plug different solutions in. I had proposed it in MySQL because I had written different engines and knew what a nightmare it was to make it work.
That engine interface has generated millions of dollars. When I wanted to make it available at MySQL the backlash was significant. Some of sales freaked out, some of marketing thought we were going to let others take over the product, and alliances wanted to know how we were going to limit it to “select partners”. On top of that, half of engineering wanted to go and re-engingeer it immediately.
In Drizzle we have spent a significant amount of time reworking the interface, but it is far from perfect. We redesigned it so that engines now own their own meta data and federate that data to the microkernel. We also designed the interface to require that all new engines have ACID like qualities, know how to handle their own recovery, and can handle failure gracefully. Our core engine is Innodb. We have had others propose new engines, and we have even supported other engines, but at the end of the day we know people want a transactional engine mainly because they don’t want to find that their data has been trashed.
Our Innodb is a little different. We have more views into the state of the engine, and we fixed our version to compile with a C++ compiler. We cleaned up warnings and fixed the bugs that popped up from that. We have begun to refactor it so that it is more integrated with Drizzle’s thread scheduler.
Innodb would have been the default engine for MySQL long ago if not for some “not invented here” mentality, mixed with a flopped buy out attempt. Heikki, the inventor of Innodb, came out quite well in all of this. Good for him.
I don’t believe we will spend much more energy on the storage engine interface going forward. It is a dead business, and while there are a couple of companies that have built brand and product enough to make a go of the business I don’t expect any additional ones will show up. The storage engine business made money for MySQL, but it was a big distraction. While with Drizzle is easier to integrate an engine, I’m not sure that a business exists for storage engine vendors with it. I’ll write more about this at a later date.
Speaking of dates, Drizzle’s internal format for timestamps is 64bit. There is still some work to be done to allow to use all 64bits, but you won’t need to recompile or change your disk format for them. Right now we need to fix some tests, and make sure a couple of functions will handle the formatting, but we store your data such that going forward, or backwards, you are in good shape. Unlike MySQL we store time as GMT, so there is no screwing around with “well I stored my data in my local time zone, but we had the machine set too…”. I have personally spent over a month of time just fixing bugs in that code.
We have spent a lot of time fixing bugs. We get a big collective smile on our faces when we read about new bugs that have been discovered in MySQL, when we discover that we don’t have the bug. We have spent a lot of time fixing bugs, and a considerable amount of our time has been spent on finding new ways to test Drizzle. I am sure plenty of bugs exist to be found.
We also support storing/comparing/displaying time with microseconds. We also have a real BOOL type, which I have been told is handy for the SQL Alchemy folks, and a native ISO UUID type. The UUID is interesting in that it stores time as well as being unique. It isn’t as fast as “please give me the next number”, but I believe it will be useful for a lot of applications. We have refactored all of the types, and the only one that was not size related that we dropped was SET. If you wonder why we dropped it read the section in the MySQL manual about its limitations and bugs.
Why do we allow only DELETE against a single table at a time, like pretty much every other RDBMS? Beyond the conceptual issue that few can wrap their heads around how to form, let alone feel like they know what the query will do, we hit the problem of the “multiple execution path”. There were a lot of one off execution paths in MySQL. In a lot of cases I know these were dead refactoring projects that were never completed. The “multiple execution path” problem is particularly disturbing when you think about fixing a bug. If you fix a bug in DELETE you need to know that there is an execution path for a single table that is different then the path used for multiple tables. This leads to odd behaviors, and a much richer set of bugs that exist.
SQL modes? Those are gone. If you wonder about what sorts of problems they create inside of the server, I’d suggest reading about the “Legend of the Ambalappuzha Paal Paayasam”.
In general in Drizzle we have tried to get rid of Gotchas that we have found. Things like declaring a column NOT NULL and discovering that somehow the database still stored a NULL is gone. Altering a table and adding a field that would violate the structure of the table? That is gone.
It is amazing that ALTER TABLE works, as the code there is Byzantine. We have made some effort to clean it up, but it is still way too tricky. I wish we could have done more there, but it is what it is. Are you using partitions? Make sure you back up your data before doing an ALTER TABLE. Wrapping partitions into the system in the way that was done at the time was simple, but it is far from robust.
I had hoped that with 5.1 we would have created a single logging API, but instead we ended up with multiple logging API internally in the server. With Drizzle we ripped all of them out and installed a single API. It is crazy simple to write a new logging plugin.
Which gets into the philosophy of plugins in general. Writing plugins should be low hanging fruit. Whenever possible we have tried to make it the case.
We have an entirely new INFORMATION_SCHEMA in Drizzle. It is based in table functions, which is a new concept in Drizzle. We keep a separate schema called DATA_DICTIONARY, in it we put whatever we like. Our INFORMATION_SCHEMA is only what the SQL standard has specified. We do zero vendor modification to it. Another hats off to SQL Server, their INFORMATION_SCHEMA is the closest to complying with the standard.
Drizzle’s drivers are BSD. They were written outside of Sun, and Sun signed off on contributions to them under a BSD license. They speak Drizzle and MySQL’s protocol. There is a JDBC version that was written. Their adoption is becoming wide spread. Licensing clarity around them is a big win for us, and for ISV who want to integrate. MySQL’s licensing mess was related to a lot of hand waving involving them. Recently I noticed that MongoDB had written up a clear licensing policy with regards to their own drivers. Awesome.
We never got to finish all, or really much of any, of what we wanted to do with the Drizzle protocol. I believe this is an area where we will see change in the near future. Internally inside of drizzle we have a C++ interface that resembles JDBC that lets us execute queries. We will be doing a lot more with that interface going forward.
What about performance? With Drizzle we began doing benchmarks early on, using a few different benchmarks. The benchmark generated by sysbench has always been the benchmark we have used as our bellwether. Unlike a lot of databases we test Drizzle with up to 1024 concurrent executing queries. Most of the benchmarks I see people run are for far less connections. We have chosen time and time again to favor performance gains at the high end, over gains on the low end. We are roughly double in performance from where we began. We could still do a lot better. MySQL 5.5 has a new meta data locking system which should do well in a number of situations, we could do a bit better in some of these cases. Our lack of a MyISAM would make it simple for us to move forward in this direction if we want too.
There has been a lot asked about our claim on scaling with lots of cores. Our process there is simple, eliminate locks, favor performance gains when we find them that favor of additional CPU, and try whenever possible to remove strong ownership that require waits for locks. MySQL relies on MyISAM, and MyISAM has significant locks, especially around the keycache, we got rid of those by freeing ourselves from MyISAM. We have had some gains with our new scheduler and we have done some to improve how IO is handled.
I am sure we have a lot of tuning still to do. We won’t be publishing benchmarks which compare us to others though. I’ve yet to see a comparison benchmark which wasn’t completely flawed, and even when they are not, few people really understand them. They fall into the classic “how many angles does it take to dance on the head of pin” conversations.
Our authentication system is modular, and we need to iron out more of the authorization system.
I’ve seen someone say that Drizzle is designed for Google and Facebook. This is not the case at all. We built it so that the next Facebook, Google, etc would have a platform to build on. Facebook and Google have their own forks of MySQL, they aren’t going to be using Drizzle. The pieces are there for the next company who needs to innovate, it is just a matter of someone making use of them. We speak the MySQL protocol, so the typical MySQL application runs just fine on Drizzle without change. We designed Drizzle to work as a piece of someone’s current infrastructure, not be yet another application which has a costly integration. We have a NoSQL sort of solution via the blob streaming module, but we are first and foremost a relational database.
What will the next Google or Facebook find? A much more friendly platform than what MySQL provided to develop on and with. The big success for Drizzle has been in the people that have been involved. We are without a doubt the descendant of MySQL that has the largest contributor base, and we have long passed MySQL with regards to contributors. We are well into the hundreds when it comes to developers who have contributed code. We have had more then 921 commits in the last month across 20 people. Our numbers go up and down, but we are consistently more then double anyone else in size. If you just walked out of college, or skipped it all together, you are going to have a much easier time adjusting Drizzle to your needs . At least we believe this :)
The codebase is C++, we make use of Boost, and while we are cautious, we tend to favor more forward thought in how we code. Readability is the key to creating code that others will use. Because in the end? We can scale silicon, but carbon? People are much harder to scale.
The people to thank for the code:
Brian Aker
Monty Taylor
Stewart Smith
Lee Bieber
Jay Pipes
Padraig O’Sullivan
Andrew Hutchings
Marko Mäkelä
Joe Daly
Olaf van der Spek
Vijay Samuel
Patrick Crews
Toru Maesaka
David Shrewsbury
Eric Day
Zimin
Marisa Plumb
Joseph Daly
Barry Leslie
Asil Dimov
Mark Atwood
Tim Penhey
Jimmy Yang
Paul McCullagh
Nathan Williams
Paweł Blokus
Sunny Bains
Andy Lester
Hartmut Holzgraefe
Trond Norbye
Other people to thank?
David Douglas who at Sun supported us initially, and when we didn’t think our internal support initially at Sun could get any better? Bob Brewen worked with us till the end came for Sun. An extra mention should be made for Lee Bieber, he has been working with the project from nearly the beginning as well. He has handled project management, done code refactoring, made flyers, organized dinners, and did everything else in between.
Mike Shadle for getting us machines, and making sure everything runs. Adrian Otto at Rackspace should be thanked (along with a number of other people as well).
A thank you should go to Chris Dibona for the Google Summer of Code project. We have a number of students who now work on databases for a living thanks to that program. While with MySQL we constantly failed at getting student’s code into the server, with Drizzle we have had a lot of success.
There is an entire channel of people who have been involved with Drizzle on Freenode in #drizzle who should be thanked as well. IRC is how we communicate.
There are a lot of other people I am forgetting to thank, sorry about that.
So what next? There is a lot more to Drizzle then what I have written above. Having worked on this for years I often forget what the differences are anymore. There are lots of new features, plenty of new enhancements, and new bugs just waiting to be found. I’m giving a talk at Web 2.0 Expo in a couple of weeks in San Francisco were I will talk about some of what we have done and are doing for virtualization.
I will being giving a keynote at the O’Reilly MySQL Conference & Expo, and there are a handful of talks there on Drizzle as well. The MySQL Ecosystem is a radically different place then what it was a year ago, I’ll be commenting on it in the future online and at the conference.
About a week ago Monty Taylor and I sat down and talked about what we wanted to do with Drizzle going forward. Monty has been working on this since the beginning with me, and he has been a lot of fun to work with. One conclusion that we both came to was that we want to see where people will take Drizzle before we determine too much about its future. It is easy to get caught up in new features, and we are interested in seeing how others use it before too many decisions are made about what to do next.
Saturday, March 12, 2011
Wikipedia, Mornings, and Danger of an iPad
Wakeup at 7:08
The movie I was watching last night, a semi-documentary on the early punk music scene leads me to remember that there was a character called the “Brood” in the Marvel Universe.
So I lookup “Brood”.
Next I read up on the Acanti. Who doesn’t like space traveling wales that do harmony?
Which leads me to read up on Acheron Empire because of a slight reference to to the Acanti.
This of course leads me to read up on the Hyborian Age, and read the entry on Robert E. Howard. I briefly read the entry on Red Sonja, which leads me to the entry on the film, ad makes me wonder what was up with the copyright on Conan at the time. I also wonder what the X governor of California is up too, and wonder if the movement to change who can become president is moving forward at all. Which makes me wonder if he wouldn’t make for a somewhat palatable republican candidate. I can only imagine him on stage with Palin.
I then retreat from this entire tangent.
I look to see whatever happened to Brigitte Nielsen and discover that celebrity drug hab TV exists, and then have to look up Jamiee Foxworth because I have no idea who this child actress was. If you aren’t Gary Coleman or Drew Barrymore. I have no idea who you are.
Sometime in the last year I’ve read the article on Drew Barrymore, so I can skip that.
Jumping back I read the entry on “Kull of Atlantis”. I do not get the appeal of barbarian fiction.
The apeman article is just a jumping off point for a number of topics. I read up on the concept of “Person” and read about Humanezee. Parahumans? Check.
Did you know that there is a movement called the “Great Ape Project”?
Give rights to our fellow apes.
The DNA difference is quite small. I then read about the Soviet project to breed human hybrids. I know the germans had one as well, but I don’t notice it linked into any of the articles. The story of “Oliver (chimpanzee)” is pretty sad. We humans suck. I read up on Karl Pinkington, and from there…
Ancient kingdom of the Picts, which requires me to understand Bede the historian.
The modern movement for different countries in the United Kingdom to gain some level of independence is fascinating (especially since in the end it disenfranchises the English (deserves you right…).
Transhumance has nothing to do with transhumanity. I did briefly see an article on that, but I have heard enough about it in life. On the other hand reading up on transhumance gives me a better picture about the legalize structure of a nation.
Want to start a movement to do away with counties in Washington?
I find myself reading up on Wales which leads to articles on the End of Roman Rule in Britain. Hadrian’s Wall? Sounds like it was a taxing structure. Built in seven years and 80 miles long!
I have doubts that we could do that today. It was historically saved by a plan of purchasing land around it, using that land for sheep, and using the proceeds to buy more land.
Devolution, Padani, and Nunavut all follow. I hadn’t realized how much the structure of Canada had changed in the last could of decades.
Go Nunavut!
BTW Canada? You suck for shipping eight families off to the great white north and not letting them come back south when they realize that you had sold them a bill of goods. That is just awful. Who would have thought Canadians did this shit?
Grise Fiord? At least let them rename it to a language that is spoken in Canada.
The Welsh at least have “Snowdonia”. Way more pleasant name.
Time spent on the above? About two hours.
Tuesday, March 8, 2011
Do It Tomorrow, Simple Notepads are the Solution to Stress
I was glancing through Boing Boing tonight and noticed an article about an iPhone Application which is yet another “this is how to organize your life”.
There was a point in my life where every time I had an idea I would open up a window, put in a little bit of code, save, and move on. Despite the small amount of effort that this took I found it stressful.
I had all of these directories scattered around that had ideas in them.
Sometimes, if I had a window already open, I would just go work out the idea if it was small enough in the current code I had opened. Most of the time this was ok, but only ok. I’m not religious about code when it comes to a patch being only one thing (I rollup patches when they are related so there isn’t a lot of difference). It is not the best practice but it isn’t the worst either.
The problem was, sometimes this would go wrong. The idea I had required more work then what I thought, or the idea was a distraction from what I was working on.
So how did I solve this?
I just keep a simple note in my email where I tack on new ideas as I have them. There is no organization, no tags, and I just simply delete a line once I have completed it (and I doubt I will ever complete all of it). I scan it from time to time to see if I can remove anything from it, but for the most part I just leave it alone.
Once I have added an idea to the note I find that all of my stress goes away. I’ve recorded the thought, it will be there later to look at.
I find that most of the stress I have is not about completing what I need to complete, but it is about losing the knowledge of what I might like to complete.
Monday, March 7, 2011
Google Summer of Code! Have an idea for Drizzle?
We have submitted our application for Google Summer of Code, and have a wiki page up for projects.
Many of our students have went on to get jobs in the database industry after GSOC, and we have a high rate of “you wrote the code, it will end up the main release”.
Are you a student and databases are not your thing? I’d go look at other projects which have been successful with GSOC and see if there are anything that interests you. Open source is an awesome way to get real world experience developing software.
Wednesday, March 2, 2011
Tuesday, March 1, 2011
Ignite MySQL 2011!
Ignite MySQL! As a reminder we are still taking talks for Ignite show at the MySQL Conference.
Friday, February 25, 2011
Since 2009, Mr. Heicklen has stood there and at courthouse entrances elsewhere and handed out pamphlets encouraging jurors to ignore the law if they disagree with it, and to render verdicts based on conscience.
http://www.nytimes.com/2011/02/26/nyregion/26jury.html?hp <— jury nullification advocate.
Wednesday, February 23, 2011
In Honduras, for example, executives at the American-affiliated chamber expressed support for the June 2009 coup d’état that forced out President José Manuel Zelaya, the State Department cables say. After leaders in the group applied pressure on the Obama administration, American officials retreated from their initial demands that Mr. Zelaya be allowed to return to power.
http://www.nytimes.com/2011/02/24/world/americas/24chamber.html?hp So why is this interesting? Obama came out against the action in Honduras publicly even if it was not clear, whether or not the courts in Honduras had the power to make the decision to remove the sitting president.
Gearman 0.16 Released
- Fixed cases where silent failure of server for queues would not cause tests to fail.
- Fix for failure when setsockopt() failed on new connection.
- Fixed silen exit in client library based on signals.
- Error log now reports failure location in compiled code for ERROR conditions.
- Fix for failover.
- Fixed issue in client where it would silently die based on signal issues.
- “verbose” has been added to the string protocol. It allows you to see what the verbose setting is.
You can find the latest version on Launchpad.
Wednesday, February 9, 2011
Upgrading, Hudson to Jenkins
Yesterday we finished upgrading Drizzle’s hudson servers to Jenkins. We are a long time Hudson user, and we love it. It has been one of the most top notch pieces of open source software I have seen delivered in the last few years.
The authors, especially Kohsuke Kawaguchi, have done an amazing job.
So why move from Hudson to Jenkins? Because the authors have moved. The people we have trusted, and respect have been required to change the name from Hudson to Jenkins. We will bet our money and time on the folks who have earned it. These are he people who have been delivering time and time again.
In the last couple of years I have sold a dozen or more companies on Hudson (and a fair number of open source projects).
Over the years I have used a dozen or so continuous integration systems. Some were home grown, and some were purchased. In the software business, there are two things that differentiate the professionals from the rest. How you make use of revision control, and how you test your software. I have often wished that we would have had a piece of software like this for MySQL, and I love that we have had it for Drizzle.
Jenkins is just as important as git, bzr, and subversion. I am really happy to see the direction the authors are taking it in.
Tuesday, February 8, 2011
Drizzle, Speaking Engagements, Out and about...
Want to come hear me talk about Drizzle? The conference season is just beginning to heat up and there are several opportunities to meet up with myself and others about the latest and greatest in Drizzle.
MySQL Meetup in Seattle, March 1st, http://www.meetup.com/seattlemysql/events/16489408/
Web 2.0 Expo in San Francisco, March28th
O’Reilly MySQL Conference and Expo, April 11th-14th
http://en.oreilly.com/mysql2011/
Drizzle Developer Day! April 15th,
http://blog.drizzle.org/2011/01/17/drizzle-developer-day-2011/