August 7, 2014

Actian Vector Hadoop Edition

I have a small blacklist of companies I won’t talk with because of their particularly unethical past behavior. Actian is one such; they evidently made stuff up about me that Josh Berkus gullibly posted for them, and I don’t want to have conversations that could be dishonestly used against me.

That said, Peter Boncz isn’t exactly an Actian employee. Rather, he’s the professor who supervised Marcin Zukowski’s PhD thesis that became Vectorwise, and I chatted with Peter by Skype while he was at home in Amsterdam. I believe his assurances that no Actian personnel sat in on the call. 🙂

In other news, Peter is currently working on and optimistic about HyPer. But we literally spent less than a minute talking about that

Before I get to the substance, there’s been a lot of renaming at Actian. To quote Andrew Brust,

… the ParAccel, Pervasive and Vectorwise technologies are being unified under the Actian Analytics Platform brand. Specifically, the ParAccel technology … is being re-branded Actian Matrix; Pervasive’s technologies are rechristened Actian DataFlow and Actian DataConnect; and Vectorwise becomes Actian Vector.

and

Actian … is now “one company, with one voice and one platform” according to its John Santaferraro

The bolded part of the latter quote is untrue — at least in the ordinary sense of the word “one” — but the rest can presumably be taken as company gospel.

All this is by way of preamble to saying that Peter reached out to me about Actian’s new Vector Hadoop Edition when he blogged about it last June, and we finally talked this week. Highlights include:

Vectorwise, while being proudly multi-core, was previously single-server. The new Vector Hadoop Edition is the first version with node parallelism.
Actian’s Vector Hadoop edition uses HDFS (Hadoop Distributed File System) and YARN to manage an Actian-proprietary file format. There is currently no interoperability whereby Hadoop jobs can read these files. However …
… Actian’s Vector Hadoop edition relies on Hadoop for cluster management, workload management and so on.
Peter thinks there are two paying customers, both too recent to be in production, who between then paid what I’d call a remarkable amount of money.*
Roadmap futures* include:
- Being able to update and indeed trickle-update data. Peter is very proud of Vectorwise’s Positional Delta Tree updating.
- Some elasticity they’re proud of, both in terms of nodes (generally limited to the replication factor of 3) and cores (not so limited).
- Better interoperability with Hadoop.

Actian actually bundles Vector Hadoop Edition with DataFlow — the old Pervasive DataRush — into what it calls “Actian Analytics Platform – Hadoop SQL Edition”. DataFlow/DataRush has been working over Hadoop since the latter part of 2012, based on a visit with my then clients at Pervasive that December.

*Peter gave me details about revenue, pipeline, roadmap timetables etc. that I’m redacting in case Actian wouldn’t like them shared. I should say that the timetable for some — not all — of the roadmap items was quite near-term; however, pay no attention to any phrasing in Peter’s blog post that suggests the roadmap features are already shipping.

The Actian Vector Hadoop Edition optimizer and query-planning story goes something like this:

Vectorwise started with the open-source Ingres optimizer. After a query is optimized, it is rewritten to reflect Vectorwise’s columnar architecture. Peter notes that these rewrites rarely change operator ordering; they just add column-specific optimizations, whatever that means.
Now there are rewrites for parallelism as well.
These rewrites all seem to be heuristic/rule-based rather than cost-based.
Once Vectorwise became part of the Ingres company (later renamed to Actian), they had help from Ingres engineers, who helped them modify the base optimizer so that it wasn’t just the “stock” Ingres one.

As with most modern MPP (Massively Parallel Processing) analytic RDBMS, there doesn’t seem to be any concept of a head-node to which intermediate results need to be shipped. This is good, because head nodes in early MPP analytic RDBMS were dreadful bottlenecks.

Peter and I also talked a bit about SQL-oriented HDFS file formats, such as Parquet and ORC. He doesn’t like their lack of support for columnar compression. Further, in Parquet there seems to be a requirement to read the whole file, to an extent that interferes with Vectorwise’s form of data skipping, which it calls “min-max indexing”.

Frankly, I don’t think the architectural choice “uses Hadoop for workload management and administration” provides a lot of customer benefit in this case. Given that, I don’t know that the world needs another immature MPP analytic RDBMS. I also note with concern that Actian has two different MPP analytic RDBMS products. Still, Vectorwise and indeed all the stuff that comes out Martin Kersten and Peter’s group in Amsterdam has always been interesting technology. So the Actian Vector Hadoop Edition might be worth taking a look at before you redirect your attention to products with more convincing track records and futures.

Categories: Actian and Ingres, Clustering, Database compression, Hadoop, ParAccel, Pervasive Software, SQL/Hadoop integration, VectorWise, Workload management

Subscribe to our complete feed!

Comments

4 Responses to “Actian Vector Hadoop Edition”

Peter Boncz on August 7th, 2014 4:36 pm

Hi Curt,

Thanks for the write-up! Two minor comments.

In response of the last paragraph, I want to note that Vector(wise) has been now a decade in development, and has been in production for years at more than a 100 customers, and I would almost start to call it “mature”. But sure, the Hadoop Edition is new and brings new challenges, but it is not starting from zero.

Final point is that I would like to advertise that Vector Hadoop Edition in my opinion is really really fast – I think it is the fastest SQL-on-Hadoop system out there by some margin.

It beats Impala on its own benchmark of choice by a factor 14 on the same hardware. If you are interested to read more on the performance side, I just published an article on that on the little blog I have started with Thomas Neumann of TUM (databasearchitects.blogspot.com):

http://databasearchitects.blogspot.com/2014/08/tpc-ds-with-vector-hadoop-edition.html

enjoy!

Peter Boncz
Graham on August 8th, 2014 4:24 am

Curt,

Regarding your “concern that Actian has two different MPP analytic RDBMS products” …

… The way it was explained by Actian at their Boulder Brains Trust briefing was that ParAccel/Matrix is an additional purchase (the “Extreme Performance Edition”) if you need “low latency, very high performance analytics”.

Should we deduce that this other new thing has high latency and so-so performance that only the purchase of an another product will mitigate ?
Curt Monash on August 8th, 2014 7:34 am

Graham,

I wouldn’t take any of those positioning statements too seriously from a company that wants to pretend totally different products are somehow the same thing. This was a disaster when Informix pushed the “one code line” inaccuracy (that’s euphemism) in the 1990s, and it’s unlikely to work well for Actian now.
Julien Le Dem on December 18th, 2014 2:02 pm

Regarding this part:
“Further, in Parquet there seems to be a requirement to read the whole file”
This is incorrect. One of the main goals is to read columns independently so it would be silly to require reading the entire file.
I’m happy to clarify any point that would need to be.
Parquet already implements data skipping based on min and max with predicate push down. Further improvements are coming. Building additional indexes around Parquet should be straightforward.

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Actian Vector Hadoop Edition

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin