10 July 2011

A Note on YCSB

Recently we had to benchmark a number of In-Memory databases available, mainly open source ones. I didn't know about YCSB until my architect told me about it.
YCSB = Yahoo! Cloud Serving Benchmark
It didn't impress me at first because it was from Yahoo! no offense but Yahoo! still expects us to pay for it's email POP3 access (Yahoo! Plus), they haven't learned anything from GMail, immaturity at its best. Nevertheless we started our benchmarking with Oracle and MongoDB. I know neither of them is an in-memory database but we liked the concept of memory mapped data of MongoDB.

I wrote the Oracle client for YCSB and MongoDB client was included with the benchmark code (thanks to Yen Pai). Writing a client for YCSB is fairly simple and that's what impressed me. But my impressions were washed away by horrible glitches I found in the included drivers as well as in YCSB code itself. There are a number of forks (including mine, which is a dead one by the way) which provide a lot of patches to the original YCSB code and include many new clients as well but the owner of the project Brian Frank Cooper has a very small interest in reviewing them.

I ran the first benchmark on 1,00,000 data sets for all the work loads provided with YCSB. Default workloads are not sufficient to test all the operation properly, which forced me to create my own workload configuration. It turned out that MongoDB was just 2-4 times faster than Oracle and that didn't impressed us much. So we considered Gemfire and Hazelcast as well, both "real" in-memory databases, one open source and other commercial (a 60 day trial in this case).

Again I had to write the clients for both the new DBs and it turned out to be a piece of cake. I have to admit YCSB has a great pluggability, plugging a client for any db just requires the driver libs + some 20 lines of code and you are done . YCSB can also run on multiple machines. YCSB offers a great platform for benchmarking any kind of database out there and same should be realized by Yahoo! or Brian Cooper who can put some more effort in its development.

Here are the results of MongoDB, Gemfire and Hazelcast benchmarks on 100000 data sets:

Operation (100,000)
DBs Throughput (operations/sec)

Write (ops/sec)
Read (ops/sec)

MongoDB turns out to be the winner, the reason which I can think of is that both Gemfire and Hazelcast use JVM but MongoDB leaves everything to OS by mapping the data into memory.

More about YCSB can be found here and on the wiki