Exasol
Analysis of data warehouse DBMS vendor Exasol. Related subjects include:
Are analytic RDBMS and data warehouse appliances obsolete?
I used to spend most of my time — blogging and consulting alike — on data warehouse appliances and analytic DBMS. Now I’m barely involved with them. The most obvious reason is that there have been drastic changes in industry structure:
- Many of the independent vendors were swooped up by acquisition.
- None of those acquisitions was a big success.
- Microsoft did little with DATAllegro.
- Netezza struggled with R&D after being bought by IBM. An IBMer recently told me that their main analytic RDBMS engine was BLU.
- I hear about Vertica more as a technology to be replaced than as a significant ongoing market player.
- Pivotal open-sourced Greenplum. I have detected few people who care.
- Ditto for Actian’s offerings.
- Teradata claimed a few large Aster accounts, but I never hear of Aster as something to compete or partner with.
- Smaller vendors fizzled too. Hadapt and Kickfire went to Teradata as more-or-less acquihires. InfiniDB folded. Etc.
- Impala and other Hadoop-based alternatives are technology options.
- Oracle, Microsoft, IBM and to some extent SAP/Sybase are still pedaling along … but I rarely talk with companies that big. 🙂
Simply reciting all that, however, begs the question of whether one should still care about analytic RDBMS at all.
My answer, in a nutshell, is:
Analytic RDBMS — whether on premises in software, in the form of data warehouse appliances, or in the cloud — are still great for hard-core business intelligence, where “hard-core” can refer to ad-hoc query complexity, reporting/dashboard concurrency, or both. But they aren’t good for much else.
Comments on Gartner’s 2012 Magic Quadrant for Data Warehouse Database Management Systems — evaluations
To my taste, the most glaring mis-rankings in the 2012/2013 Gartner Magic Quadrant for Data Warehouse Database Management are that it is too positive on Kognitio and too negative on Infobright. Secondarily, it is too negative on HP Vertica, and too positive on ParAccel and Actian/VectorWise. So let’s consider those vendors first.
Gartner seems confused about Kognitio’s products and history alike.
- Gartner calls Kognitio an “in-memory” DBMS, which is not accurate.
- Gartner doesn’t remark on Kognitio’s worst-in-class* compression.
- Gartner gives Kognitio oddly high marks for a late, me-too Hadoop integration strategy.
- Gartner writes as if Kognitio’s next attempt at the US market will be the first one, which is not the case.
- Gartner says that Kognitio pioneered data warehouse SaaS (Software as a Service), which actually has existed since the pre-relational 1970s.
Gartner is correct, however, to note that Kognitio doesn’t sell much stuff overall.
* non-existent
In the cases of HP Vertica, Infobright, ParAccel, and Actian/VectorWise, the 2012 Gartner Magic Quadrant for Data Warehouse Database Management’s facts are fairly accurate, but I dispute Gartner’s evaluation. When it comes to Vertica: Read more
Notes on some basic database terminology
In a call Monday with a prominent company, I was told:
- Teradata, Netezza, Greenplum and Vertica aren’t relational.
- Teradata, Netezza, Greenplum and Vertica are all data warehouse appliances.
That, to put it mildly, is not accurate. So I shall try, yet again, to set the record straight.
In an industry where people often call a DBMS just a “database” — so that a database is something that manages a database! — one may wonder why I bother. Anyhow …
1. The products commonly known as Oracle, Exadata, DB2, Sybase, SQL Server, Teradata, Sybase IQ, Netezza, Vertica, Greenplum, Aster, Infobright, SAND, ParAccel, Exasol, Kognitio et al. all either are or incorporate relational database management systems, aka RDBMS or relational DBMS.
2. In principle, there can be difficulties in judging whether or not a DBMS is “relational”. In practice, those difficulties don’t arise — yet. Every significant DBMS still falls into one of two categories:
- Relational:
- Was designed to do relational stuff* from the get-go, even if it now does other things too.
- Supports a lot of SQL.
- Non-relational:
- Was designed primarily to do non-relational things.*
- Doesn’t support all that much SQL.
*I expect the distinction to get more confusing soon, at which point I’ll adopt terms more precise than “relational things” and “relational stuff”.
3. There are two chief kinds of relational DBMS: Read more
Many kinds of memory-centric data management
I’m frequently asked to generalize in some way about in-memory or memory-centric data management. I can start:
- The desire for human real-time interactive response naturally leads to keeping data in RAM.
- Many databases will be ever cheaper to put into RAM over time, thanks to Moore’s Law. (Most) traditional databases will eventually wind up in RAM.
- However, there will be exceptions, mainly on the machine-generated side. Where data creation and RAM data storage are getting cheaper at similar rates … well, the overall cost of RAM storage may not significantly decline.
Getting more specific than that is hard, however, because:
- The possibilities for in-memory data storage are as numerous and varied as those for disk.
- The individual technologies and products for in-memory storage are much less mature than those for disk.
- Solid-state options such as flash just confuse things further.
Consider, for example, some of the in-memory data management ideas kicking around. Read more
Comments on the analytic DBMS industry and Gartner’s Magic Quadrant for same
This year’s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the 2010, 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying:
- In general, I regard Gartner Magic Quadrants as a bad use of good research.
- Illustrating the uselessness of — or at least poor execution on — the overall quadrant metaphor, a large majority of the vendors covered are lined up near the line x = y, each outpacing the one below in both of the quadrant’s dimensions.
- I find fewer specifics to disagree with in this Gartner Magic Quadrant than in previous year’s versions. Two factors jump to mind as possible reasons:
- This year’s Gartner Magic Quadrant for Data Warehouse Database Management Systems is somewhat less ambitious than others; while it gives as much company detail as its predecessors, it doesn’t add as much discussion of overall trends. So there’s less to (potentially) disagree with.
- Merv Adrian is now at Gartner.
- Whatever the problems may be with Gartner’s approach, the whole thing comes out better than do Forrester’s failed imitations.
*As of February, 2012 — and surely for many months thereafter — Teradata is graciously paying for a link to the report.
Specific company comments, roughly in line with Gartner’s rough single-dimensional rank ordering, include: Read more
Exasol update
I last wrote about Exasol in 2008. After talking with the team Friday, I’m fixing that now. 🙂 The general theme was as you’d expect: Since last we talked, Exasol has added some new management, put some effort into sales and marketing, got some customers, kept enhancing the product and so on.
Top-level points included:
- Exasol’s technical philosophy is substantially the same as before, albeit not with as extreme a focus on fitting everything in RAM.
- Exasol believes its flagship DBMS EXASolution has great performance on a load-and-go basis.
- Exasol has 25 EXASolution customers, all in Germany.*
- 5 of those are “cloud” customers, at hosting providers engaged by Exasol.
- EXASolution database sizes now range from the low 100s of gigabytes up to 30 terabytes.
- Pretty much the whole company is in Nuremberg.
Draft slides on how to select an analytic DBMS
I need to finalize an already-too-long slide deck on how to select an analytic DBMS by late Thursday night. Anybody see something I’m overlooking, or just plain got wrong?
Edit: The slides have now been finalized.
Dividing the data warehousing work among MPP nodes
I talk with lots of vendors of MPP data warehouse DBMS. I’ve now heard enough different approaches to MPP architecture that I think it might be interesting to contrast some of the alternatives.
Categories: Aster Data, Calpont, Exasol, Greenplum, Parallelization, Theory and architecture, Vertica Systems | 22 Comments |
Exasol technical briefing
It took 5 ½ months after my non-technical introduction, but I finally got a briefing from Exasol’s technical folks (specifically, the very helpful Mathias Golombek and Carsten Weidmann). Here are some highlights: Read more
Categories: Analytic technologies, Benchmarks and POCs, Columnar database management, Data warehousing, Exasol, In-memory DBMS, Memory-centric data management, Pricing | 1 Comment |
Compare/constrast of Vertica, ParAccel, and Exasol
I talked with Exasol today – at 5:00 am! — and of course want to blog about it. For clarity, I’d like to start by comparing/contrasting the fundamental data structures at Vertica, ParAccel, and Exasol. And it feels like that should be a separate post. So here goes.
- Exasol, Vertica, and ParAccel all store data in columnar formats.
- Exasol, Vertica, and ParAccel all compress data heavily.
- Exasol and Vertica operate on in-memory data in compressed formats. ParAccel decompresses the data when it gets to RAM. Exasol, Vertica, and ParAccel all — perhaps to varying extents — operate on in-memory data in compressed formats.
- ParAccel and Exasol write data to what amounts to the in-memory part of their basic data structures; the data then gets persisted to disk. Vertica, however, has a separate in-memory data structure to accept data and write it to disk.
- Vertica is a disk-centric system that doesn’t rely on there being a lot of RAM.
- ParAccel can be described that way too; however, in some cases (including on the TPC-H benchmarks), ParAccel recommends loading all your data into RAM for maximum performance.
- Exasol is totally optimized for the assumption that queries will be run against data that had already been previously loaded into RAM.
Beyond the above, I plan to discuss in a separate post how Exasol does MPP shared-nothing software-only columnar data warehouse database management differently than Vertica and ParAccel do shared-nothing software-only columnar data warehouse database management. 🙂