September 18, 2007
The core of the Vertica story still seems to be compression
Back in March, I suggested that compression was a central and compelling aspect of Vertica’s story. Well, in their new blog, the Vertica guys now strongly reinforce that impression.
I recommend those two Database Column posts (by Sam Madden) highly. I’ve rarely seen such a clear, detailed presentation of a company’s technical argument. My own thoughts on the subject boil down to:
- In principle, all the technology (and hence all the technological advantages) they’re talking about could be turned into features of one of the indexing options of a row-oriented RDBMS. But in practice, there’s no indication that this will happen any time soon.
- Release 1 of the Vertica product will surely have many rough edges.
- Some startups are surprisingly ignorant of the issue involved in building a successful, industrial-strength DBMS. But a company that has both Mike Stonebraker and Jerry Held seriously involved has a big advantage. They may make other kinds of errors, but they won’t make many ignorant ones.
Categories: Columnar database management, Data warehousing, Database compression, Michael Stonebraker, Theory and architecture, Vertica Systems
Subscribe to our complete feed!
Comments
5 Responses to “The core of the Vertica story still seems to be compression”
Leave a Reply
I think that Netezza’s materialized view functionality does essentially what you suggest – duplicating certain columns from a table in an automatically maintained “view”. I’m not sure if they’re compressed, but it’s certainly an interesting parallel.
And while I could (eventually) see Oracle or MS doing something similar, I think the advantage that Vertica has is horizontal scalability. Even Oracle RAC can’t seriously compete, for BI purposes, with Netezza or Vertica, at least not for the same cost/performance. As such I don’t think that having the feature(s) you suggest would put them on par with NZ/VT performance-wise. Simply having a similar feature may be enough for marketing purposes though.
Tom,
If you’re saying that shared-nothing MPP is the way to go for high end data warehousing, I agree completely, and have said so many times. 🙂
As for Netezza’s limited materialized view capability, which of course can be mimicked by other vendors’ fuller materialized view features — I agree up to a point. Materialized views and/or specialized indices (including bitmaps) can capture some of the benefits of a columnar architecture. But at this time “some” is the operative word.
DATAllegro’s vertical partitioning seems even a bit slicker than materialized views. Netezza’s zone maps have similar benefits. But the Vertica guys do make a compelling case that a fully columnar architecture goes further than those row-based workarounds.
CAM
Heh, no I understand your opinions on MPP databases. 🙂 I interpreted your comments to mean that Oracle/MS could be competitive if they added similar features, which may not have been your intent.
I think that “materialized views” is a misnomer when used in the context of NZ – it feels (to me) much more like a column store than a materialized view. Now that you reference the old post about vertical partitioning, however, NZ’s MViews remind me even more of Kognitio’s in-memory images than anything else. So thanks for pointing that out.
I wrote a blog post of my own about that, actually, which links to this post, but for some reason trackbacks to your blog have never worked for me, so… you’d have no way to know that. 🙂
Trackbacks on my blogs are a HUGE problem, both outgoing and incoming. I’m hoping that’s a function of the weird Bulgarian WordPress theme I’m using, and will get fixed when I change the blogs’ look. (Coming soon, but just when I got back my web designer went on a snorkeling vacation of her own! Glub, glub, glub. Smart woman!)
Even so, the WordPress dashboard captured your incoming link. So *I* knew you’d linked, even if my readers didn’t …
Netezza would be the first to agree that their usage of the term “materialized views” is somewhat notstandard. Indeed, just about every vendor has different terminology in the area. E.g., so far as I can figure out, Teradata does have conventional materialized views, but calls them “join indexes.” And there’s some subtle difference between Oracle and IBM in the materialized view area, although I’ve forgotten right now what it is.
CAM
[…] do most of that ourselves” line of argument, some of which I’ve summarized in a comment here. But he made two other interesting points as […]