August 17, 2013

Aerospike 3

My clients at Aerospike are coming out with their Version 3, and as several of my clients do, have encouraged me to front-run what otherwise would be the Monday embargo.

I encourage such behavior with arguments including:

“Nobody else is going to write in such technical detail anyway, so they won’t mind.”
“I’ve done this before. Other writers haven’t complained.”
“In fact, some other writers like having me go first, so that they can learn from and/or point to what I say.”
“Hey, I don’t ask for much in the way of exclusives, but I’d be pleased if you threw me this bone.”

Aerospike 2’s value proposition, let us recall, was:

… performance, consistent performance, and uninterrupted operations …

Aerospike’s consistent performance claims are along the lines of sub-millisecond latency, with 99.9% of responses being within 5 milliseconds, and even a node outage only borking performance for some 10s of milliseconds.

Uninterrupted operation is a core Aerospike design goal, and the company says that to date, no Aerospike production cluster has ever gone down.

The major support for such claims is Aerospike’s success in selling to the digital advertising market, which is probably second only to high-frequency trading in its low-latency demands. For example, Aerospike’s CMO Monica Pal sent along a link to what apparently is:

a video by a customer named Brightroll …
… who enjoy SLAs (Service Level Agreements) such as those cited above (they actually mentioned five 9s)* …
… at peak loads of 10-12 million requests/minute.

*I haven’t watched the video, but Monica helpfully included a small amount of transcript.

Monica also updated Aerospike’s business highlights as (with some editing by me):

Headcount – 50 and hiring.

# of production customers – fairly high double digits, all paying.

Biggest database – 30TB and growing.

Most customers are at 1-4TB of unique data; most replicate 2x; many also replicate across data centers.

Pricing – Free Community Edition with 2 servers and 200GB data. Enterprise Edition priced per terabyte and per datacenter, unlimited nodes per cluster, unlimited number of clusters, pay only for unique data, not replicas. Most start at $50k.

However, Aerospike 2 didn’t have much in the way of data manipulation options. Aerospike described its eponymous product as a key-value store, although I gather it was possible to look into the values up to a point; specifically:

Aerospike has always had integer and string datatypes to extract.
Aerospike has always had “bins”, which are like columns but don’t require consistent datatypes from one record to the next … or indeed any datatypes at all.
By “bins … are like columns” I mean that you can retrieve a projection just on a bin.

Aerospike 3 adds more data manipulation features. Notes on that start:

Aerospike 3 adds more datatypes, with a strong emphasis on nesting.
Aerospike 3 adds secondary indexes.
Aerospike 3 adds Lua UDFs (User-Defined Functions).
Aerospike assures me that its DBMS-experienced development team consists of a lot more than the one-man Russell Sullivan acqui-hire. 🙂

Secondary indexes in Aerospike 3 just work with strings and integers, but Aerospike doesn’t dispute my opinion that it would be nice to index the (values inside the) new datatypes as well.

Specifically, Aerospike 3 adds four new datatypes, which it calls “complex”:

Key-value pairs, for some reason called “maps”.
Sets, of data of any datatype. As you might imagine, sets are unordered and have unique values.
Lists, of data of any datatype. As you might imagine, lists are ordered.
Stacks, of data of any datatype. There are lists with much better performance, but with the update functionality limitations you’d imagine from the name “stack”.

Aerospike also added what it calls “large” datatypes, which is a pointer-like way to link blocks. The point of those is to work around what is otherwise the record size limit (typically 128 Kb), so as to tie together all the information on, say, a single user.* Aerospike gives the impression that customers have been custom-building datatypes of these kinds all along, with stacks being especially popular.

*When I heard “collect all a user’s interaction data in one place”, the first thing I thought of was WibiData.

Notes on the Lua UDFs include:

They are heavily pipelined. Even so …
… in the interest of speed, there is no real node-to-node data movement. Aggregations get finished on the client.
Hence, even though there are primitives called map() and reduce(), which mean about what you’d think they would …
… Aerospike was mercifully easy to persuade not to call this a form of MapReduce. 🙂

So yes — Aerospike 3 may be regarded as Aerospike’s version of support for real-time analytics.

Categories: Aerospike, Market share and customer counts, Memory-centric data management, NoSQL, Pricing, Web analytics

Subscribe to our complete feed!

Comments

3 Responses to “Aerospike 3”

Mark Callaghan on August 18th, 2013 11:45 am

This is a bold claim, or maybe I don’t understand what the words mean (from http://www.aerospike.com/features/). I guess they should also add, “no bugs”.

“We’re also reliable — no outages, zero downtime, no data loss:”

And then http://www.aerospike.com/performance/ goes on to describe whey they never fail.
Srini V. Srinivasan on August 18th, 2013 12:57 pm

Hi Mark,

Your point is a good one that merits clarification. The statements are based on internet scale services running using Aerospike software for over 3 years now – individual nodes can fail due to hardware faults but the service needs to be up without data unavailability, complete loss of performance, etc.. By focusing on real-time performance, replication, self-management, linear scalability, etc., Aerospike has been able to deliver high availability for RTB and other services in the Digital Advertising Market – the SLAs here are very stringent – 99.9% of requests under a few milliseconds, 24X7 uptime, no service means no revenue, etc.

You may find it interesting to look at our technical paper in the 2011 VLDB conference that explains some of our techniques for delivering real-time performance with immediate consistency: http://www.aerospike.com/wp-content/uploads/2012/07/VLDB-Paper.pdf

Thanks.
— Srini
Peter Corless on April 21st, 2015 12:58 pm

Just as an update, the nodes per cluster in Aerospike Community Edition (as well as Enterprise Edition) is now the same since we went open source: 127 nodes per cluster, with is no limit to the total number of clusters you can run with the open source edition.

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Aerospike 3

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin