February 18, 2008

ParAccel technical highlights

I recently caught up with ParAccel’s CTO Barry Zane and Marketing VP Kim Stanick for a long technical discussion, which they have graciously continued by email. It would be impolitic in the extreme to comment on what led up to that. Let’s just note that many things I’ve previously written about ParAccel are now inoperative, and go straight to the highlights.

ParAccel sells a columnar, disk-centric data warehouse DBMS. Similar but not identical data structures are used in RAM cache and on disk. If there’s enough RAM, ParAccel’s system runs entirely in memory, except to the extent it obviously doesn’t (e.g., transaction persistence). In its TPC-H benchmarks and in some customer situations, ParAccel has run entirely in memory.
ParAccel initially stores updates (whether transactional or bulk load) in cache. At transaction commit time, or when the cache fills, changed blocks are stored on disk. Thus, as in most other DBMS, it is necessary to read a block into memory in the first place before you change it.
One ParAccel option is “Amigo” mode, in which the ParAccel database is continually synchronized with a SQL Server database, and queries are dynamically routed to one the two systems. (There’s no true federation at this time.) Each resynchronization starts with a new SQL Server query, at a scheduled interval. This interval can be as low as 5 seconds or as high as 10-20 minutes. Barry thinks the overhead of the resulting updates is “noise level” if the interval is 30 seconds or higher.
Writing a row or reasonably small group of rows in a table with C columns requires C writes to disk, versus the 1 write required in a row-based system. (For a sufficiently large bulk load, of course, that wouldn’t be true. Consider the extreme example in which the whole database is loaded. Then the number of blocks written is the same no matter what architecture you have, except for the differences caused by compression, by any indexes you store on disk, and so on.)
While single-record inserts are much slower than in row-based systems, Barry thinks that performance sacrifices are minor if rows are loaded a few thousand at a time or more. (I believe that in this and similar estimates he assumes the number of columns to be no more than a few dozen. While accurate for most applications, that might not be true for users who manipulate 1000+ column credit records.)
ParAccel claims strong SQL Server compatibility, including running TSQL stored procedures (but not other stored procedure languages, Postgres PGPLSQL excepted). However, while the SQL execution itself is parallel, the rest of the stored procedure only executes on a single “leader” node.
Oracle/PSQL compatibility is a roadmap item.
ParAccel supports C/C++ UDFs (User Defined Functions). Scalar UDFs execute in parallel. However, a UDF that invokes SQL runs only on the leader node – except, of course, for the SQL part itself.
In Amigo mode, ParAccel of course runs the same schema as the OLTP SQL Server instance it’s synchronizing with. Thus, they in no way make the Vertica assumption that all data warehouses have star or snowflake schemas. Nor do they replicate fact tables between nodes. Barry claims that ParAccel has done a great job on internode transport speeds, but the details are confidential.
Even more confidential is support for another claim of Barry’s. Just as columnar systems are slow when writing whole rows, they also are slow when retrieving them. But ParAccel has a deeply-secret way of greatly reducing this penalty.
Like Vertica, ParAccel supports limited materialized views, called “projections.” A major use of these is to store columns in multiple sort orders.

Categories: Columnar database management, Data warehousing, Emulation, transparency, portability, Microsoft and SQL*Server, ParAccel

Subscribe to our complete feed!

Comments

5 Responses to “ParAccel technical highlights”

Stuart Frost on February 21st, 2008 3:57 pm

Curt,

I’m confused by the comment that, in Amigo mode, the schema is the same on the OLTP SQL Server and on ParAccel. That seems to be incredibly limiting. Although it might be true for some very simple reporting applications, no data warehouse I’ve ever seen uses the same schema as the OLTP source system. Also, what if there are many source systems (which is the typical case)?

If true, surely this would be unusable in the vast majority of real-world situations.

Stuart
Curt Monash on February 21st, 2008 4:27 pm

Stuart,

And thus you’ve neatly explained why not EVERY ParAccel customer buys Amigo mode.

CAM
Doug McGraw on March 17th, 2008 12:46 pm

Curt,

I’ve read comments (yours and others) about columnar databases being slow to retrieve whole rows; but, I don’t hear anyone saying “how slow”. Can you shed any light on this? Are we talking 10’s or 100’s of milli-seconds … or longer?
Curt Monash on March 18th, 2008 2:34 pm

Doug,

If you’re retrieving N fields in a row, the base case is N * (the work of retrieving one row), because you have to look in N different places.

Obviously, a big part of columnar DBMS design is figuring out ways to outperform the base case. But absent something like the TransRelational architecture (see the category for same) — or some other major deviation from a simple-minded columnar approach — it’s hard.

That’s for single rows. Once you’re retrieving lots of blocks of data, then the factor can be diminished or go away entirely, and be outweighed by columnar’s inherent advantages (you’re not retrieving the WHOLE row, and compression may work better).

CAM
Load speeds and related issues in columnar DBMS | DBMS2 -- DataBase Management System Services on December 2nd, 2008 10:54 am

[…] Please do not rely on the parts of the post below that are about ParAccel. See our February 18 post about ParAccel instead. […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

ParAccel technical highlights

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin