Columnar database management
Analysis of products and issues in column-oriented database management systems. Related subjects include:
Vertica’s innovative architecture for flash, plus more about temp space than you perhaps wanted to know
Vertica is announcing:
- Technology it already has released*, but has not published any reference architectures for.
- A Barney partnership.**
In other words, Vertica has succumbed to the common delusion that it’s a good idea to put out half-baked press releases the week of TDWI conferences. But if we look past that kind of all-too-common nonsense, Vertica is highlighting an interesting technical story, about how the analytic DBMS industry can exploit solid-state memory technology.
*Upgrades to Vertica FlexStore to handle flash memory, actually released as part of Vertica 4.0
** With Fusion I/O
To set the context, let’s recall a few points I’ve noted in the past:
- Solid-state memory’s price/throughput tradeoffs obviously make it the future of database storage.
- The flash future is coming soon, in part because flash’s propensity to wear out is overstated. This is especially true in the case of modern analytic DBMS, which tend to write to blocks all at once, and most particularly the case for append-only systems such as Vertica.
- Being able to intelligently split databases among various cost tiers of storage – e.g. flash and disk – makes a whole lot of sense.
Taken together, those points tell us:
For optimal price/performance, analytic DBMS should support databases that run part on flash, part on disk.
While all this is a future for some other analytic DBMS vendors, Vertica is shipping it today.* What’s more, three aspects of Vertica’s architecture make it particularly well-suited for hybrid flash/disk storage, in each case for a similar reason – you can get most of the performance benefit of all-flash for a relatively low actual investment in flash chips: Read more
Categories: Columnar database management, Data warehousing, Database compression, Solid-state memory, Vertica Systems | 10 Comments |
What kinds of data warehouse load latency are practical?
I took advantage of my recent conversations with Netezza and IBM to discuss what kinds of data warehouse load latency were practical. In both cases I got the impression:
- Subsecond load latency is substantially impossible. Doing that amounts to OLTP.
- 5 seconds or so is doable with aggressive investment and tuning.
- Several minute load latency is pretty easy.
- 10-15 minute latency or longer is now very routine.
There’s generally a throughput/latency tradeoff, so if you want very low latency with good throughput, you may have to throw a lot of hardware at the problem.
I’d expect to hear similar things from any other vendor with reasonably mature analytic DBMS technology. Low-latency load is a problem for columnar systems, but both Vertica and ParAccel designed in workarounds from the getgo. Aster Data probably didn’t meet these criteria until Version 4.0, its old “frontline” positioning notwithstanding, but I think it does now.
Related link
-
Just what is your need for speed anyway?
Categories: Analytic technologies, Aster Data, Columnar database management, Data warehousing, IBM and DB2, Netezza, ParAccel, Vertica Systems | 4 Comments |
Ingres VectorWise technical highlights
After working through problems w/ travel, cell phones, and so on, Peter Boncz of VectorWise finally caught up with me for a regrettably brief call. Peter gave me the strong impression that what I’d written in the past about VectorWise had been and remained accurate, so I focused on filling in the gaps. Highlights included: Read more
Categories: Actian and Ingres, Analytic technologies, Benchmarks and POCs, Columnar database management, Data warehousing, Database compression, Open source, VectorWise | 2 Comments |
More on Sybase IQ, including Version 15.2
Back in March, Sybase was kind enough to give me permission to post a slide deck about Sybase IQ. Well, I’m finally getting around to doing so. Highlights include but are not limited to:
- Slide 2 has some market success figures and so on. (>3100 copies at >1800 users, >200 sales last year)
- Slides 6-11 give more detail on Sybase’s indexing and data access methods than I put into my recent technical basics of Sybase IQ post.
- Slide 16 reminds us that in-database data mining is quite competitive with what SAS has actually delivered with its DBMS partners, even if it doesn’t have the nice architectural approach of Aster or Netezza. (I.e., Sybase IQ’s more-than-SQL advanced analytics story relies on C++ UDFs — User Defined Functions — running in-process with the DBMS.) In particular, there’s a data mining/predictive analytics library — modeling and scoring both — licensed from a small third party.
- A number of the other later slides also have quite a bit of technical crunch. (More on some of those points below too.)
Sybase IQ may have a bit of a funky architecture (e.g., no MPP), but the age of the product and the substantial revenue it generates have allowed Sybase to put in a bunch of product features that newer vendors haven’t gotten around to yet.
More recently, Sybase volunteered permission for me to preannounce Sybase IQ Version 15.2 by a few days (it’s scheduled to come out this week). Read more
Technical basics of Sybase IQ
The Sybase IQ folks had been rather slow about briefing me, at least with respect to crunch. They finally fixed that in February. Since then, I’ve been slow about posting based on those briefings. But what with Sybase being acquired by SAP, Sybase having an analyst meeting this week, and other reasons – well, this seems like a good time to post about Sybase IQ. 🙂
For starters, Sybase IQ is not just a bitmapped system, but it’s also not all that closely akin to C-Store or Vertica. In particular,
- Sybase IQ stores data in columns – like, for example, Vertica.
- Sybase IQ relies on indexes to retrieve data – unlike, for example, Vertica, in which the column pretty much is the index.
- However, columns themselves can be used as indexes in the usual Vertica-like way.
- Most of Sybase IQ’s indexes are bitmaps, or a lot like bitmaps, ala’ the original IQ product.
- Some of Sybase IQ’s indexes are not at all like bitmaps, but more like B-trees.
- In general, Sybase recommends that you put multiple indexes on each column because — what the heck – each one of them is pretty small. (In particular, the bitmap-like indexes are highly compressible.) Together, indexes tend to take up <10% of Sybase IQ storage space.
Categories: Columnar database management, Data warehousing, Database compression, Sybase, Theory and architecture | 3 Comments |
Further quick SAP/Sybase reactions
Raj Nathan of Sybase has been calling around to chat quickly about the SAP/Sybase deal and related matters. Talking with Raj didn’t change any of my initial reactions to SAP’s acquisition of Sybase. I also didn’t bother Raj with too many hard questions, as he was clearly in call-and-reassure mode, reaching out to customers and influencers alike.
That said, Read more
Quick reactions to SAP acquiring Sybase
SAP is acquiring Sybase. On the conference call SAP said Sybase would be run as a separate division of SAP (no surprise). Most of the focus was on Sybase’s mobile technology, which is forecast at >$400 million in 2010 revenues (which would be 30%ish of the total). My quick reactions include: Read more
Vertica update
Last month, Vertica’s CEO Ralph Breslauer quit,* and Vertica made it sound like there would be a new CEO late in April. And indeed, as of April 29, there was. He’s a guy I’ve never heard of before named Chris Lynch, apparently quite the sales machine builder. The most substance I’ve found is a pair of Mass High Tech articles — the latter exceedingly typo-ridden — to the general effect that:
- Vertica plans to build a massive, world-conquering sales force.
- If Vertica dips back into negative cash flow to do that and has to raise more venture capital, so be it.
- “Triple-digit” revenue growth is expected for this year.
Infobright blog update
I often offer that, if a company puts up a sufficiently good blog post, I’ll link to it. Well, I just noticed that Infobright CEO Mark Burton (somewhere along the way he seems to have dropped the “interim”) put up an excellent post last month.
Highlights on the market share/sector side include: Read more
Categories: Columnar database management, Data mart outsourcing, Data warehousing, Infobright, Log analysis, Market share and customer counts, Open source, Web analytics | 1 Comment |
Vertica 4.0
Vertica briefed me last month on its forthcoming Vertica 4.0 release. I think it’s fair to say that Vertica 4.0 is mainly a cleanup/catchup release, washing away some of the tradeoffs Vertica had previously made in support of its innovative DBMS architecture.
For starters, there’s a lot of new analytic functionality. This isn’t Aster/Netezza-style ambitious. Rather, there’s a lot more SQL-99 functionality, plus some time series extensions of the sort that financial services firms – an important market for Vertica – need and love. Vertica did suggest a couple of these time series extensions are innovative, but I haven’t yet gotten detail about those.
Perhaps even more important, Vertica is cleaning up a lot of its previous SQL optimization and execution weirdnesses. In no particular order, I was told: Read more