Introduction to Citrusleaf
Citrusleaf is the vendor of yet another short-request/NoSQL database management system, conveniently named Citrusleaf. Highlights for Citrusleaf the company include:
- 8 employees.
- $2 million in recently acquired venture capital.
- 1 1/2 – 2 1/2 years of total company history, depending on how you count.
- An undisclosed but nonzero number of paying customers, concentrated in the real-time advertising market, with a typical application being cookie management.
Citrusleaf the product is a kind of key-value store; however, the values are in the form of rows, so what you really look up is (key, field name, value) triples. Right now only the keys are indexed; futures include indexing on the individual fields, so as to support some basic analytics. SQL support is an eventual goal. Other Citrusleaf buzzword basics include:
- ACID-compliant.
- Log-structured.
- Tunable consistency model.
To date, Citrusleaf customers have focused on sub-millisecond data retrieval, preferably .2-.3 milliseconds. Accordingly, none has chosen to put the primary Citrusleaf data store on disk. Rather:
- Citrusleaf indexes are always in RAM. (Citrusleaf forces this, actually.)
- You can keep data in RAM and copy it to disk.
- You can keep data on solid-state drives. (Just A Bunch Of Flash or Fusion I/O.)
I don’t have a good grasp on what the data structure for those indexes is.
Citrusleaf characterizes its customers as firms that have “a couple of KB” of data on “every” person in North America. Naively, that sounds like a terabyte or less to me, but Citrusleaf says 1-3 terabytes is most common. Or to quote the press release, “The most common deployments for Citrusleaf 2.0 are terabytes of data, billions of objects, and 200K plus transactions per second per node, with sub-millisecond latency.” 4-8 nodes seems to be typical for Citrusleaf databases (all figures pre-replication). I didn’t ask what kind of hardware is at each node.
Citrusleaf data distribution features include:
- More logical nodes than physical ones, so that adding physical nodes and redistributing the data is relatively straightforward.
- Something unspecified that helps redistribute indexes pretty easily too.
- The aforementioned tunable consistency, and …
- … a tunable replication factor, which nobody ever sets to more than 2, and which can be dropped to 1 on-the-fly in the face of capacity/performance problems such as cascading failures.
Like dbShards, Citrusleaf includes some client-side code to eliminate the need for a proxy.
Citrusleaf doesn’t publish a price list, but says it uses MongoDB support contracts as a pricing benchmark. Self-service downloadable Citrusleaf free trials are available.
Comments
6 Responses to “Introduction to Citrusleaf”
Leave a Reply
Citrusleaf’s in-memory index structure is a “hash of red-black trees”. We have found RB trees to be surprisingly high performance for this application. We have also researched Cliff Click’s lock free hash table, and did not find it worth the complexity.
The typical hardware our data center customers use is Dell R600/R700 with or without Samsung SS805 disks, or the equivalent HP DL380’s. One customer uses Intel X25-M drives. These are typically dual Xeon 55xx CPUs, although we have seen higher performance with some single-chip installations. We’ve also run in virtualized environments (EC2, etc) where the exact hardware type is unknown to us.
The reason that there is a larger amount of data in advertising applications than a back of the envelope calculation shows is due to people having multiple clients / machines, and the clearing of cookies that occurs.
Thanks for the in-depth post.
Thanks, Brian!
I didn’t include that data structure bit in the post myself because, um, I don’t know what a red-black tree is. 🙂
[…] Citrusleaf has released an add-on product called Citrusleaf RTA (Real-Time Attribution). It’s to be used when: […]
[…] the AeroSpike product story is as I described in two posts last year. At the highest […]
[…] the Aerospike product story is as I described in two posts last year. At the highest […]
[…] should perhaps buy it as well. Generally, the Aerospike product story is as I described in two posts last year. At the highest […]