August 16, 2008

Exasol technical briefing

It took 5 ½ months after my non-technical introduction, but I finally got a briefing from Exasol’s technical folks (specifically, the very helpful Mathias Golombek and Carsten Weidmann). Here are some highlights:

Like Vertica and ParAccel, Exasol is in the business of MPP shared-nothing software-only columnar data warehouse database management.
Exasol has no concept of a “head” or “master” node, with different software than the others. Instead, all nodes are peers. For example, any node’s IP address can be given to an application; that node will then parse the SQL and distribute it appropriately to the other nodes.
Exasol is ACID-compliant, swapping blocks to disk when there’s an update. And one certainly can query data that’s on disk …
… however, Exasol’s memory structures are totally optimized for in-memory operation. Exasol is perfectly happy to swap in different parts of the database on a scheduled basis every few hours, but sending queries straight to disk isn’t optimal. Exasol’s recommended hardware configurations always are designed so that most queries can be executed against data already in RAM. However, if for example only the last 30 days of data are in RAM and a few queries go against full-year data, that’s OK.
Exasol has a compression story typical for a columnar DBMS vendor – heavy use of dictionary/token compression, other unspecified compression algorithms as well, data kept compressed in RAM, etc.
Like most other MPP data warehousing vendors, Exasol partitions data among nodes via a hash key. This is the industry’s most common scheme, because it has the parallelization benefits of random/equal distribution of data, yet still lets you get a head start on some hairy hash joins for extra performance.
Like Vertica, Exasol replicates small tables (e.g., dimension tables) across each node.
Exasol’s optimizer creates and maintains join indexes automagically on the fly. Exasol disagreed when I say “Oh, like a materialized view?” But I suspect this is the kind of join index that Teradata says privately is a special case of materialized view, and says publicly is a lot like a materialized view.
Generally, Exasol describes its optimizer as being “very MPP-aware.”
Exasol mainly wrote its own code from scratch. Right now they seem to have a kind of distributed operating system called EXACluster running over Linux, but they seem to be replacing the Linux underpinnings with their own stuff. E.g., disk access is going into EXACluster.
EXACluster already handles high availability/failover between nodes.
Exasol replicates data among nodes to allow for failover. That sounds similar to Vertica’s approach. Also, if you add nodes and restart Exasol, the database will automagically be repartitioned.
The biggest deployed Exasol system mentioned has 3 terabytes of user data. It is running on 5 nodes w/ 32 GB of RAM each.
For any given amount of total RAM a user is willing to deploy, Exasol recommends more nodes with less RAM/node. I didn’t probe directly as to why.
Exasol doesn’t have stored procedures. They assert that stored procedures would be useful mainly for ELT/ETL, and that alternatives perform well enough.
Like many data warehouse specialists, Exasol recommends ELT (Extract/Load/Transform) over ETL (Extract/Transform/Load).
Exasol has user-defined functions (UDFs).
Exasol is working on BLOB support. Geospatial data is also on the radar (no pun intended), but it didn’t sound as if there was a currently active project.

We also talked about concurrency, which is always a confusing subject. Exasol said that to date there were no more than 50 concurrent “log-ins,” which they equate to there being 1000s of named users (because queries execute so quickly). They also say they’ve tested up to 400 concurrent queries internally. I didn’t probe about what they’d do to balance short-running and long-running queries, in part because Exasol gives the impression that on their systems, there is no such thing as a long-running query. But obviously this is all somewhat fuzzy.

In a related point, Exasol says that overall throughput is higher when there is at least a certain number of concurrent users. The supporting evidence offered was, of all things, TPC-H benchmarks. Apparently (I haven’t checked this myself), Exasol (and also ParAccel, which of course has a similar architecture) chose to run the benchmark with more than the minimum number of simultaneous users required. SMP systems, Exasol believes, don’t exhibit similar behavior.

Finally, a couple of less technical highlights:

Licensing is per-gigabyte of RAM. (This fits with the whole memory-centric orientation.) 100 gigabytes of RAM are 120,000 Euros list price. Price doesn’t scale linearly with amount of RAM.
The partner whose name was redacted in February is now officially disclosed. Exasol is partnering in Japan with the services side of Hitachi. Exasol says Hitachi has 15-20 people working on introducing Exasol to Japan. Target customers are not primarily Hitachi’s hardware installed base.

Categories: Analytic technologies, Benchmarks and POCs, Columnar database management, Data warehousing, Exasol, In-memory DBMS, Memory-centric data management, Pricing

Subscribe to our complete feed!

Comments

One Response to “Exasol technical briefing”

Infology.Ru » Blog Archive » Краткое представление технологии Exasol on December 2nd, 2008 2:31 am

[…] Автор: Curt Monash Дата публикации оригинала: 2008-08-16 Перевод: Олег Кузьменко Источник: Блог Курта Монаша […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Exasol technical briefing

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin