More on Actian/ParAccel/VectorWise/Versant/etc.
My quick reaction to the Actian/ParAccel deal was negative. A few challenges to my views then emerged. They didn’t really change my mind.
Amazon Redshift
Amazon did a deal with ParAccel that amounted to:
- Amazon got a very cheap license to a limited subset of ParAccel’s product …
- … so that it could launch a service called Amazon Redshift.
- Amazon also invested in ParAccel.
Some argue that this is great for ParAccel’s future prospects. I’m not convinced.
No doubt there are and will be Redshift users, evidently including Infor. But so far as I can tell, Redshift uses very standard SQL, so it doesn’t seed a ParAccel market in terms of developer habits. The administration/operation story is similar. So outside of general validation/bragging rights, Redshift is not a big deal for ParAccel.
OEMs and bragging rights
It’s not just Amazon and Infor; there’s also a MicroStrategy deal to OEM ParAccel — I think it’s the real ParAccel software in that case — for a particular service, MicroStrategy Wisdom. But unless I’m terribly mistaken, HP Vertica, Sybase IQ and even Infobright each have a lot more OEMs than ParAccel, just as they have a lot more customers than ParAccel overall.
This OEM success is a great validation for the idea of columnar analytic RDBMS in general, but I don’t see where it’s an advantage for ParAccel vs. the columnar leaders.
Concurrency
As I admitted in the comment thread to my first Actian/ParAccel post, I’m confused about what kind of concurrent usage ParAccel can really support. The data I have, e.g. in the link immediately above, is not conclusive. Googling suggests that VectorWise was at one user per core a couple of years ago, supportive of my hypothesis that it doesn’t have some big concurrency edge on ParAccel. But to repeat — I don’t really know.
DBMS acquisitions in the past
My history blog on DBMS acquisitions yielded more favorable examples than I was expecting. (Of course, I omitted a lot of small and boring failures.) And DBMS conglomerates are the rule more than the exception, with IBM, Sybase, Teradata and Oracle all adopting acquisition-aided multi-DBMS strategies, at least to some extent.
That said, Sybase is the main example of a vendor of a slow-growth DBMS (Adaptive Server Enterprise) doing well with a faster-growing one (Sybase IQ). Perhaps not coincidentally, Actian’s latest management team draws significantly on Sybase. So yes; ParAccel is now owned by a company run by guys who know something about selling columnar DBMS.
But the whole thing would be more convincing if Ingres had shown more life under Actian’s ownership, or indeed at any point in the past 20 years. My bottom line is that Actian was floundering badly in the DBMS market 1 1/2 years ago, and not a lot of favorable news has emerged in the interim — except, quite arguably, for the management changes and acquisitions themselves.
Comments
7 Responses to “More on Actian/ParAccel/VectorWise/Versant/etc.”
Leave a Reply
I don’t buy the Ingres point – it was a dead cat bouncing on the road to where it is now. The issue with Actian is what were they thinking (and if they think they/the audience think that they are a post-CA life support organization.)
I think Redshift is significant because of the commoditization (and suspect the filtering of the software was not to isolate function but the *smoothly working* subset.)
When I look at the data for BI world, it is clear to me that:
– Column dbms is really only easily/valuably applicable to a small minority of reporting (because of no great benefit and of things like queries that rely on sort order that aren’t it’s strengths)
– Column dbms is used in < 1% of strong use cases
so there is a deep issue with adoption: organizations/price/PITA vendors/risk.
There clearly is a market that can be made over time for a solid column dbms product with low risk and high reward. The Redshift problem (in being that evangelizer) is that there doesn't seem to be anyone incentivized to popularize the product.
Aaron,
I don’t see why your sort order point is an argument for row stores over column stores. At best, it’s an argument for more indexing rather than less. When we put it that way, Netezza is on the side of the counterexamples.
And by the way, Sybase IQ has a lot of indexes, while Vertica lets you store columns in multiple sort orders.
Curt –
Index access to column DBMS gives up the big performance gains over row stores, because it just becomes B tree access. In Sybase IQ, in many use cases I’ve seen somewhat slower performance than row stores (it’s way better than the non-indexed scan/sort in Sybase IQ for those queries; the slower performance is probably row reconstruction cost.)
This explains why Vertica took the (materialized view) route of constructing the data in different groupings and sort orders. The penalties are the usual cost of load and storage.
Net-net – there are use cases that work best in columns stores, such as batch load, scan, small number of columns in queries. There are others that are not so much.
The default tempts practitioners to chose row stores, which lose in certain cases, but wins in many – and defensively wins in terms of future requirements and general purpose activities.
Aaron,
A row-based vendor who implemented a fairly simple columnar option told told me that columnar was faster for queries that returned 40% or so of the columns or fewer. Others have agreed.
Also, if performance depends on lots of indexes, then it will be fast for those use cases for which the indexes are already built. And building them takes work. Maintaining them takes work and equipment. Etc.
I realize that continuing to use row-based warehousing is good for various people’s job security, but I’m skeptical as to any of its other claimed advantages.
Curt – [this is a serious and important topic]
My perspective is as an agnostic practitioner who has selected/implemented both column and row stores and evaluated many.
Your point does not contradict mine. For column sweet spots, they work much better that row stores: often 10x faster, 10% of IO, and compression often is 2-3x better. BI queries generally are better, and I appreciate their value for that. For other uses, less so.
For a counterexample, look at most implementations of Teradata. It has a relatively recent column store and an effective row store optimized for normalized data. One of the big benefits effective users get is not having to spin out custom datamarts for purpose: you have operational data outside, and a single store of analytic data inside. 99+% of queries over time turn into small queries selecting very small data. The DW serves a range of use, from scoring all customers to serving typically small web queries. Most current customers will use the row store even for big fact tables as a physical design choice, and not for political reasons.
And that is the typical evolution of DW and many DMs. It starts as a modelling and large scale query area, and matures into supporting general analytic and historical data needs. The number of big scans actually decreases over time as the bigger data is better understood, and common scans are often replaced by aggregates. At the same time – demand grows for for a very diverse range of smaller queries, and often grows for trickle feeds (something the column vendors are being pushed to support, and the greatest area of DBMS fragility in these implementations.)
Column stores are built using physical design choices and optimized for certain queries (hence the indices, etc.) The problem with these choices is that often will not support undertermined future needs well.
Ultimately, column stores generally win big in for-purpose dataMARTs, and generally lose in DW and for more general needs. It would be worthwhile for you to talk to users who have had column DBMS for BI for long periods; you will see that they often end up building parallel historical data row stores with the same data or using row store ODS for query, where the column dbms was awkward.
I am not sure that I agree with much of what was said on the business side of your argument. The acquisition of ParAccel makes a lot of sense for Actian. With the other acquisitions they are a 200 mill dollar player in big data overnight. They have a big data product line – from apps to integration to database.
As far as Vectorwise, that is a SMB play for customers who will never need more than 5tb but want to do more than just spreadsheets. Its a great single server solution with a low price point.
ParAccel is enterprise class. Amazon and MicroStrategy’s deployments prove it. ParAccel’s customers are Fortune 500 and are monetizing their investments in the product. That critical mass is building. In recent head to head bake-offs ParAccel has soundly beaten Vertica, Greenplum, and Terradata.
I’d keep an eye on this combination. My bet is always on the nimble 200 mill software company when it goes against a slow billion dollar hardware company (HP, EMC, TD).
hello there and thank you for your information – I have definitely picked up anything new from
right here. I did however expertise a few technical points using this site, as I experienced to reload the
site lots of times previous to I could get it to load
correctly. I had been wondering if your web host is OK?
Not that I’m complaining, but sluggish loading instances times will very frequently affect your placement in google and could damage your high quality score if ads and marketing with Adwords. Well I am adding this RSS to my e-mail and can look out for a lot more of your respective intriguing content. Make sure you update this again soon.