10 July 2011

A Note on YCSB

Recently we had to benchmark a number of In-Memory databases available, mainly open source ones. I didn't know about YCSB until my architect told me about it.
YCSB = Yahoo! Cloud Serving Benchmark
It didn't impress me at first because it was from Yahoo! no offense but Yahoo! still expects us to pay for it's email POP3 access (Yahoo! Plus), they haven't learned anything from GMail, immaturity at its best. Nevertheless we started our benchmarking with Oracle and MongoDB. I know neither of them is an in-memory database but we liked the concept of memory mapped data of MongoDB.

I wrote the Oracle client for YCSB and MongoDB client was included with the benchmark code (thanks to Yen Pai). Writing a client for YCSB is fairly simple and that's what impressed me. But my impressions were washed away by horrible glitches I found in the included drivers as well as in YCSB code itself. There are a number of forks (including mine, which is a dead one by the way) which provide a lot of patches to the original YCSB code and include many new clients as well but the owner of the project Brian Frank Cooper has a very small interest in reviewing them.

I ran the first benchmark on 1,00,000 data sets for all the work loads provided with YCSB. Default workloads are not sufficient to test all the operation properly, which forced me to create my own workload configuration. It turned out that MongoDB was just 2-4 times faster than Oracle and that didn't impressed us much. So we considered Gemfire and Hazelcast as well, both "real" in-memory databases, one open source and other commercial (a 60 day trial in this case).

Again I had to write the clients for both the new DBs and it turned out to be a piece of cake. I have to admit YCSB has a great pluggability, plugging a client for any db just requires the driver libs + some 20 lines of code and you are done . YCSB can also run on multiple machines. YCSB offers a great platform for benchmarking any kind of database out there and same should be realized by Yahoo! or Brian Cooper who can put some more effort in its development.

Here are the results of MongoDB, Gemfire and Hazelcast benchmarks on 100000 data sets:

Operation (100,000)
DBs Throughput (operations/sec)

Gemfire
MongoDB
Hazelcast
Write (ops/sec)
3032.324
5123.475
3709.336
Read (ops/sec)
7634.170
7825.338
4315.367

MongoDB turns out to be the winner, the reason which I can think of is that both Gemfire and Hazelcast use JVM but MongoDB leaves everything to OS by mapping the data into memory.

More about YCSB can be found here and on the wiki

4 comments:

  1. There has been a contribution for a plugin for GemFire in YCSB now so that should be easier. I am curious if you can share more details on the YCSB workload file (threads, data size, read/update ratio etc) and the GemFire configuration you tested and report in this post (heap size, number of nodes, actual hardware etc).
    (disclaimer: I currently work for VMware on GemFire and related technologies)

    ReplyDelete
  2. @Alex
    Server specs: Xeon, 2 quad CPUs, 2.53 GHz, RAM 8
    Data set size:1KB
    Ratio: 100% writes/ 100% reads
    Threads:1
    Nodes:1
    Heapsize:2GB

    Hope this helps!
    -vik

    ReplyDelete
  3. nice is there a good tutorial on how to add a new database to the YCSB framework

    ReplyDelete
  4. @Ano....
    ya there is a tutorial about adding a new DB in YCSB:
    https://github.com/brianfrankcooper/YCSB/wiki/Adding-a-Database

    ReplyDelete