Data warehousing
Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:
EMC/Greenplum notes
I dropped by the former Greenplum for my quarterly consulting visit (scheduled for the first week of Q4 for a couple of reasons, one of them XLDB4). Much of what we discussed was purely advisory and/or confidential — duh! — but there were real, nonconfidential takeaways in two areas.
First, feelings about the EMC acquisition are still very positive.
- Hiring has been rapid, on track to roughly quadruple Greenplum’s size over a 1 1/2 year period. These don’t seem to be EMC imports, but rather outside hires, although EMC folks are surely helping in the recruiting.
- The former Greenplum is clearly going to pursue more product possibilities than it would have on its own. This augurs well for Greenplum customers.
- Griping about big-company bureaucracy is minimal.
- I didn’t hear one word about any unwelcome product/business strategy constraints. On the other hand …
- … the next Greenplum product announcement you’ll hear about will be one designed to be appealing to the EMC customer base — i.e., to enterprises that EMC is generally successful in selling to.
Categories: Data warehousing, EMC, Greenplum, MapReduce, Parallelization, Predictive modeling and advanced analytics | 4 Comments |
eBay followup — Greenplum out, Teradata > 10 petabytes, Hadoop has some value, and more
I chatted with Oliver Ratzesberger of eBay around a Stanford picnic table yesterday (the XLDB 4 conference is being held at Jacek Becla’s home base of SLAC, which used to stand for “Stanford Linear Accelerator Center”). Todd Walter of Teradata also sat in on the latter part of the conversation. Things I learned included: Read more
Categories: Data warehousing, Derived data, eBay, Greenplum, Hadoop, HBase, Log analysis, Petabyte-scale data management, Teradata | 30 Comments |
Some thoughts on the announcement that IBM is buying Netezza
As you’ve probably read, IBM and Netezza announced a deal today for IBM to buy Netezza. I didn’t sit in on the conference call, but I’ve seen the reporting. Naturally, I have some quick thoughts, which I’ve broken up into several sections below:
- Clearing some underbrush.
- Speculation about what IBM/Netezza will do.
- Speculation about alternative acquirers for Netezza.
- Speculation about what IBM/Netezza competitors will do.
Aster Data nCluster Version 4.6
The main thing in Aster Data nCluster Version 4.6 is Aster’s version of hybrid row-column store technology. Technical highlights include:
- Aster Data is simply taking the number of storage options in nCluster up from 1 to 2 – you now can store a table either in the Aster Data nCluster row store or column store.
- In fact, you can store parts of a table in the Aster Data nCluster row store and other parts in the Aster Data nCluster column store. I‘m a bit foggy on the details of that – Aster makes discussions of partitioning more complicated than they need to be — but it definitely sounds pretty flexible. Edit: See comment thread below.
- Anything you can do with the Aster Data nCluster row store you can also do with the Aster Data nCluster column store. In particular, that includes all of Aster Data’s analytic functionality.
- The same is true vice-versa. There is no columnar-oriented kind of compression in Aster Data nCluster at this time.
So Aster Data has now joined Greenplum/EMC among row-based analytic DBMS vendors with hybrid row-column stores. Oracle will join them some day, and the same probably applies to other row-based vendors as well. Similarly, Aster Data will probably join Oracle some day in having columnar compression. And so this all fits the model:
- Aster Data has an impressively competitive analytic relational DBMS, considering the youth and size of the company.
- Aster Data is a leader in extending its analytic relational DBMS by integrating in other analytic processing capabilities.
Categories: Analytic technologies, Aster Data, Columnar database management, Data warehousing, Database compression | 4 Comments |
DB2 workload management
DB2 has added a lot of workload management features in recent releases. So when we talked Tuesday afternoon, Tim Vincent and I didn’t bother going through every one. Even so, we covered some interesting subjects in the area of DB2 workload management, including: Read more
Categories: Data warehousing, IBM and DB2, Netezza, Workload management | 3 Comments |
More on temp space, compression, and “random” I/O
My PhD was in a probability-related area of mathematics (game theory), so I tend to squirm when something is described as “random” that clearly is not. That said, a comment by Shilpa Lawande on our recent flash/temp space discussion suggests the following way of framing a key point:
- You really, really want to have multiple data streams coming out of temp space, as close to simultaneously as possible.
- The storage performance characteristics of such a workload are more reminiscent of “random” than “sequential” I/O.
If everybody else is cool with it too, I can live with that. 🙂
Meanwhile, I talked again with Tim Vincent of IBM this afternoon. Tim endorsed the temp space/Flash fit, but with a different emphasis, which upon review I find I don’t really understand. The idea is:
- Analytic DBMS processing generally stresses reads over writes.
- Temp space is an exception — read and write use of temp space is pretty balanced. (You spool data out once, you read it back in once, and that’s the end of that; next time it will be overwritten.)
My problem with that is: Flash typically has lower write than read IOPS (I/O per second), so being (relatively) write-intensive would, to a first approximation, seem if anything to disfavor a workload for flash.
On the plus side, I was reminded of something I should have noted when I wrote about DB2 compression before:
Much like Vertica, DB2 operates on compressed data all the way through, including in temp space.
Categories: Data warehousing, Database compression, IBM and DB2, Vertica Systems | 6 Comments |
Vertica’s innovative architecture for flash, plus more about temp space than you perhaps wanted to know
Vertica is announcing:
- Technology it already has released*, but has not published any reference architectures for.
- A Barney partnership.**
In other words, Vertica has succumbed to the common delusion that it’s a good idea to put out half-baked press releases the week of TDWI conferences. But if we look past that kind of all-too-common nonsense, Vertica is highlighting an interesting technical story, about how the analytic DBMS industry can exploit solid-state memory technology.
*Upgrades to Vertica FlexStore to handle flash memory, actually released as part of Vertica 4.0
** With Fusion I/O
To set the context, let’s recall a few points I’ve noted in the past:
- Solid-state memory’s price/throughput tradeoffs obviously make it the future of database storage.
- The flash future is coming soon, in part because flash’s propensity to wear out is overstated. This is especially true in the case of modern analytic DBMS, which tend to write to blocks all at once, and most particularly the case for append-only systems such as Vertica.
- Being able to intelligently split databases among various cost tiers of storage – e.g. flash and disk – makes a whole lot of sense.
Taken together, those points tell us:
For optimal price/performance, analytic DBMS should support databases that run part on flash, part on disk.
While all this is a future for some other analytic DBMS vendors, Vertica is shipping it today.* What’s more, three aspects of Vertica’s architecture make it particularly well-suited for hybrid flash/disk storage, in each case for a similar reason – you can get most of the performance benefit of all-flash for a relatively low actual investment in flash chips: Read more
Categories: Columnar database management, Data warehousing, Database compression, Solid-state memory, Vertica Systems | 10 Comments |
Teradata’s future product strategy
I think Teradata’s future product strategy is coming into focus. I’ll start by outlining some particular aspects, and then show how I think it all ties together.
Read more
Categories: Business intelligence, Data warehouse appliances, Data warehousing, Kickfire, MicroStrategy, Solid-state memory, Storage, Teradata | 5 Comments |
Big Data is Watching You!
There’s a boom in large-scale analytics. The subjects of this analysis may be categorized as:
- People
- Financial trades
- Electronic networks
- Everything else
The most varied, interesting, and valuable of those four categories is the first one.
Links and observations
I’m back from a trip to the SF Bay area, with a lot of writing ahead of me. I’ll dive in with some quick comments here, then write at greater length about some of these points when I can. From my trip: Read more
Categories: Analytic technologies, Aster Data, Calpont, Cassandra, Couchbase, Data warehouse appliances, Data warehousing, EMC, Exadata, Facebook, Greenplum, HP and Neoview, Kickfire, NoSQL, OLTP, ParAccel, Sybase, XtremeData | 1 Comment |