Where ParAccel is at
Until recently, I was extremely critical of ParAccel’s marketing. But there was an almost-clean sweep of the relevant ParAccel executives, and the specific worst practices I was calling out have for the most part been eliminated. So I was open to talking and working with ParAccel again, and that’s now happening. On my recent California trip, I chatted with three ParAccel folks for a few hours. Based on that and other conversation, here’s the current ParAccel story as I understand it.
I’ve already noted that PADB 3.0 is coming soon (ParAccel Analytic DataBase), but pending its arrival, ParAccel’s technical story is primarily about query performance. More specifically:
- ParAccel asserts that PADB is much faster than other analytic DBMS — even close competitors such as Vertica — on especially complex queries. “60-way joins” were mentioned. So was the flattening of correlated subqueries.
- ParAccel also claims industry-leading performance on simpler queries, but not by the same (or perhaps even particular large) margins.
- Mercifully, ParAccel no longer claims to have never, ever lost on performance in a customer evaluation. But it still says that is very close to being true.
- Major reasons ParAccel gives for PADB’s high performance include:
- Like Vertica, Sybase IQ, and others, PADB uses a columnar architecture.
- ParAccel thinks PADB’s newest query optimizer — fondly named Omne — is outstanding.
- ParAccel’s PADB compiles its queries.
- In general, ParAccel is just performance-obsessed.
- One could also mention:
- ParAccel’s PADB runs smoothly in-memory, if that’s what you want.
- ParAccel also offers a Flash option for PADB.
- Like many other analytic DBMS vendors, ParAccel has created a custom networking protocol. (ParAccel has talked about that altogether too much in the past.)
- Like Vertica, ParAccel’s PADB generally decompresses data as late as the particular compression scheme used allows. (Well, actually, that’s not one ParAccel mentions unless asked.)
- ParAccel has long encouraged one to put part of one’s database on direct-attached storage as a kind of persistent cache, plus all of it on a storage-area network, because PADB can optimize its scans to go against both physical stores.
- ParAccel’s PADB does encryption a block at a time, rather than a row at a time, so there’s very little overhead to using the encryption feature.
- ParAccel says that PADB has no indexes, materialized views, etc., notwithstanding that I heard something different from Barry Zane a few years ago. This is the basis for ParAccel’s claim that no tuning (or at least very little) is required, or indeed even possible …
- … and similarly, it is the reason ParAccel encourages prospects to do ad-hoc queries in their POCs (Proofs Of Concept), at least when Vertica is the competitor.
- However, ParAccel’s PADB has rather complex initial set-up. This has been the basis for widespread skepticism about ParAccel’s “no tuning” claim. ParAccel is working to automate that away, but admits to being only part-way through the process.
- Highlights of ParAccel’s data writing strategy include:
- PADB sends data transactionally to disk.
- PADB usually sends data to disk a block at a time, because it is coming in fast enough for that to work out (either due to bulk load or streaming).
- PADB is append-only …
- … so PADB has a garbage-collection mechanism called Vacuum. Right now Vacuum has to be started manually, but doesn’t block reads and writes; full background garbage collection is of course a roadmap feature.
- As is natural for append-only systems, ParAccel’s PADB has MVCC (MultiVersion Concurrency Control) and snapshot isolation.
- Name a compression method, and PADB probably has it — 13 in all by ParAccel’s count, including dictionary/token, run-length encoding, Delta, LZ, and so on.
Tracking ParAccel’s customer success has long been difficult. The 2009 Gartner Magic Quadrant claim of ~20 ParAccel customers seems odd to everybody, including ParAccel. ParAccel’s own reporting of customer wins around then was quite confusing. And ParAccel’s customer count a year before that was extremely low. But ParAccel’s Michael Weir just rounded up some figures for me, namely:
- ParAccel has 30+ revenue-recognized customers, not counting OEMs, OEMs’ customers, or paid POCs.
- 2 ParAccel customers have > 100 TB of user data.
- 7 ParAccel customers have > 10 TB of user data.
- The largest ParAccel cluster is 28 nodes and growing.
Naturally, Michael went on to note that even relatively small databases can have high value.
One last note: ParAccel has approximately 78 employees.
Comments
10 Responses to “Where ParAccel is at”
Leave a Reply
Software license list price of $100K per TB of user data with 20% annual support seems a bit… you know…. steep, even for best-in-breed performance.
Are unofficial discounts so deep making list prices completely irrelevant?
If you follow the link in my recent post on pricing, you’ll see that somebody said in one particular scenario 70-80% was typical.
Yep, everything is or can be negotiated, although 1-TB or something deals may up to a point be exceptions.
Well… I had read it then and i have re-read it now and haven’t noticed discounts being quoted by anybody.
However, you quoted Netezza being $11K per TB list price for both hardware and software. So here is my confusion how could it be so different.
But I kinda understood. With $11K/TB Netezza are not going to offer any discounts, will have large number of minimum TBs that must be purchased at once and a few hidden up-sell premiums are most likely also and ParAccel must have really deep discounts particularly for multi-TB licenses. This way both prices reconcile to same ball-park figures.
Camuel,
Data warehouse appliance pricing is apples to oranges when comparing to software-only data warehouse solutions because:
(1) For the appliance you are paying for hardware as well
(2) For the appliance you are paying for disk space in terms of the maximum capacity of the system (rather than how much data you actually have). Furthermore, as you use more of your allocated space in an appliance, you performance goes down since the disk bandwidth remains constant no matter how much data you have. Hence, nobody actually fills up their appliance all the way.
Reason (2) is the main reason why you see the appliance prices typically much lower than the software-only solutions.
Either way, for both reasons, it usually doesn’t make much sense to compare appliance pricing with software-only pricing.
Thanks Daniel for the insight. This makes perfect sense for me.
However, at the end one must compare, somehow, different solutions to make an intelligent choice. Exactly as one must compare apples and oranges to make a choice 🙂 if not directly then on utility function.
The direct comparison of per-TB price of software-solution to per-TB price of appliance-solution is completely pointless. Got it. Thanks again.
Dear Curt,
I wanted to make a comment on the append only portion of your blog… My company is currently undergoing a POC with Paraccel and I can attest to the fact that PADB is not append only. DB is fully acid compliant and updates/deltes are fully supported.
Thanks.
Vitaly,
Append-only does NOT contradict ACID.
Best,
CAM
Hi Curt – let me pile on and clarify a bit re. the Append-Only comment. As Vitaly notes we are fully ACID-compliant and clearly can handle updates and deletes. Our internal implementation for this is based upon a paradigm where an update is essentially a delete followed by an insert. A delete itself is simply a logical operation. As you note, we periodically ‘vacuum’ for garbage collection. This implementation provides both enhanced performance and improved isolation.
Another couple notes:
* To further our ‘obsession’ with performance (nice choice of words, Curt – we absolutely resemble that remark! :-), we have a blog post on “The Will To Design For Performance” that you and your readers may want to take a look at ( http://paraccel.com/the-will-to-design-for-performance/). We believe that analytics-driven decisions are becoming critical across most, if not all, industries and having a high performance analytic database is the foundation for this. The need for agility and velocity as one follows a deductive thought process over vast volumes of data looking for patterns or validating a hypothesis makes performance a core requirement for analytics and not simply a nice to have.
* I want to ensure we are clear regarding the Gartner MQ – in 2009 we had more than enough customers to qualify and for 2010 (survey recently completed), we had over 20 of our customers volunteer to participate. Not sure if or when we exhibited confusion, but it was probably my fault – I do talk pretty fast ;-). As we discussed, we’ve had a great year to date both in revenue and with many key large customer wins – we are on track to continue.
Michael,
Thanks for participating!
My point on customer count was that if you had ~20 a year ago, as Gartner suggests, and ~30 now, you had a pretty weak year in the interim.
My guess is that, as with so much else, the previous marketing regime at ParAccel also inflated customer count.
Best,
CAM
[…] as if ParAccel can do more than I thought until now: it’s a columnar DB, can run in-memory, can manage and take advantage of dual […]