Aerospike 3
My clients at Aerospike are coming out with their Version 3, and as several of my clients do, have encouraged me to front-run what otherwise would be the Monday embargo.
I encourage such behavior with arguments including:
- “Nobody else is going to write in such technical detail anyway, so they won’t mind.”
- “I’ve done this before. Other writers haven’t complained.”
- “In fact, some other writers like having me go first, so that they can learn from and/or point to what I say.”
- “Hey, I don’t ask for much in the way of exclusives, but I’d be pleased if you threw me this bone.”
Aerospike 2’s value proposition, let us recall, was:
… performance, consistent performance, and uninterrupted operations …
- Aerospike’s consistent performance claims are along the lines of sub-millisecond latency, with 99.9% of responses being within 5 milliseconds, and even a node outage only borking performance for some 10s of milliseconds.
- Uninterrupted operation is a core Aerospike design goal, and the company says that to date, no Aerospike production cluster has ever gone down.
The major support for such claims is Aerospike’s success in selling to the digital advertising market, which is probably second only to high-frequency trading in its low-latency demands. For example, Aerospike’s CMO Monica Pal sent along a link to what apparently is:
- a video by a customer named Brightroll …
- … who enjoy SLAs (Service Level Agreements) such as those cited above (they actually mentioned five 9s)* …
- … at peak loads of 10-12 million requests/minute.
*I haven’t watched the video, but Monica helpfully included a small amount of transcript.
Monica also updated Aerospike’s business highlights as (with some editing by me):
- Headcount – 50 and hiring.
- # of production customers – fairly high double digits, all paying.
- Biggest database – 30TB and growing.
- Most customers are at 1-4TB of unique data; most replicate 2x; many also replicate across data centers.
- Pricing – Free Community Edition with 2 servers and 200GB data. Enterprise Edition priced per terabyte and per datacenter, unlimited nodes per cluster, unlimited number of clusters, pay only for unique data, not replicas. Most start at $50k.
However, Aerospike 2 didn’t have much in the way of data manipulation options. Aerospike described its eponymous product as a key-value store, although I gather it was possible to look into the values up to a point; specifically:
- Aerospike has always had integer and string datatypes to extract.
- Aerospike has always had “bins”, which are like columns but don’t require consistent datatypes from one record to the next … or indeed any datatypes at all.
- By “bins … are like columns” I mean that you can retrieve a projection just on a bin.
Aerospike 3 adds more data manipulation features. Notes on that start:
- Aerospike 3 adds more datatypes, with a strong emphasis on nesting.
- Aerospike 3 adds secondary indexes.
- Aerospike 3 adds Lua UDFs (User-Defined Functions).
- Aerospike assures me that its DBMS-experienced development team consists of a lot more than the one-man Russell Sullivan acqui-hire. 🙂
Secondary indexes in Aerospike 3 just work with strings and integers, but Aerospike doesn’t dispute my opinion that it would be nice to index the (values inside the) new datatypes as well.
Specifically, Aerospike 3 adds four new datatypes, which it calls “complex”:
- Key-value pairs, for some reason called “maps”.
- Sets, of data of any datatype. As you might imagine, sets are unordered and have unique values.
- Lists, of data of any datatype. As you might imagine, lists are ordered.
- Stacks, of data of any datatype. There are lists with much better performance, but with the update functionality limitations you’d imagine from the name “stack”.
Aerospike also added what it calls “large” datatypes, which is a pointer-like way to link blocks. The point of those is to work around what is otherwise the record size limit (typically 128 Kb), so as to tie together all the information on, say, a single user.* Aerospike gives the impression that customers have been custom-building datatypes of these kinds all along, with stacks being especially popular.
*When I heard “collect all a user’s interaction data in one place”, the first thing I thought of was WibiData.
Notes on the Lua UDFs include:
- They are heavily pipelined. Even so …
- … in the interest of speed, there is no real node-to-node data movement. Aggregations get finished on the client.
- Hence, even though there are primitives called map() and reduce(), which mean about what you’d think they would …
- … Aerospike was mercifully easy to persuade not to call this a form of MapReduce. 🙂
So yes — Aerospike 3 may be regarded as Aerospike’s version of support for real-time analytics.
Comments
3 Responses to “Aerospike 3”
Leave a Reply
This is a bold claim, or maybe I don’t understand what the words mean (from http://www.aerospike.com/features/). I guess they should also add, “no bugs”.
“We’re also reliable — no outages, zero downtime, no data loss:”
And then http://www.aerospike.com/performance/ goes on to describe whey they never fail.
Hi Mark,
Your point is a good one that merits clarification. The statements are based on internet scale services running using Aerospike software for over 3 years now – individual nodes can fail due to hardware faults but the service needs to be up without data unavailability, complete loss of performance, etc.. By focusing on real-time performance, replication, self-management, linear scalability, etc., Aerospike has been able to deliver high availability for RTB and other services in the Digital Advertising Market – the SLAs here are very stringent – 99.9% of requests under a few milliseconds, 24X7 uptime, no service means no revenue, etc.
You may find it interesting to look at our technical paper in the 2011 VLDB conference that explains some of our techniques for delivering real-time performance with immediate consistency: http://www.aerospike.com/wp-content/uploads/2012/07/VLDB-Paper.pdf
Thanks.
— Srini
Just as an update, the nodes per cluster in Aerospike Community Edition (as well as Enterprise Edition) is now the same since we went open source: 127 nodes per cluster, with is no limit to the total number of clusters you can run with the open source edition.