Data warehousing
Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:
Terminology: Analytic platforms
A few weeks ago, I described the elements of an “analytic computing system” or “analytic platform,” while reserving judgment as to which of the two terms would or should win out. I am now capitulating to the term analytic platform, under the influence of, among others, Sharmila Mulligan (and Aster Data in general), Vertica and a variety of fellow analysts (Merv Adrian, Neil Raden, Seth Grimes, Jim Kobielus, and Colin White). While Google evidence would suggest it’s way too early to make this call, I think it’s time to say “analytic platform” will win.
What’s more, I now think the phrase “analytic platform” should win. While I think the term “platform” is overused to the point of silliness, at least the phrase “analytic platform” is short. Thus, it could be modified in various descriptive or not-so-descriptive ways: “Advanced analytic platform,” “graph analytics platform,” “customer analytics platform,” “social media analytics platform,” “CRM analytics platform,” “text analytics platform,” or whatever. By way of contrast, try doing that with “analytic computing system,” and see if you can keep a straight face.
To take this in the direction of an actual definition, I’ll say that the three essential elements of an analytic platform are: Read more
Categories: Analytic technologies, Data warehousing | 2 Comments |
Now we know why Vertica has been so weirdly evasive
Communicating with Vertica has been tricky recently. But HP is now announced to be buying Vertica, which pretty much forces me to comment about Vertica. 🙂 So I’ll indulge in a little bit of explanation as to what I know about Vertica, whether for publication or under NDA. My analysis of the HP/Vertica combination, and expectations for same, will go into another post. Read more
Categories: Analytic technologies, Data warehousing, HP and Neoview, Market share and customer counts, Michael Stonebraker, Vertica Systems | 10 Comments |
Upcoming webinar on investigative analytics
I recently coined the phrase investigative analytics to conflate
- Statistics, data mining, machine learning, and/or predictive analytics.
- The more research-oriented aspects of business intelligence tools:
- Ad-hoc query.
- Drilldown.
- Most things done by BI-using “business analysts”
- Most things within BI called “data exploration.”
- Analogous technologies as applied to non-tabular data types such as text or graph.
This will be be basis for my part of a webcast on March 10 at 11 am Pacific/2 pm Eastern time. The other main part of the webcast will be a demo by the webcast’s joint sponsors Aster Data and Tableau Software.
Some of Aster’s verbiage in describing and titling the webinar is so hyperbolic that I do not want to give the impression of endorsing it. But I am very hopeful that the webinar itself will be interesting and informative, and will point people at least somewhat in the direction of the benefits Aster is claiming.
Categories: Analytic technologies, Aster Data, Business intelligence, Data warehousing, Presentations, Tableau Software | 3 Comments |
Comments on the 2011 Forrester Wave for Enterprise Data Warehouse Platforms
The Forrester Wave: Enterprise Data Warehouse Platforms, Q1 2011 is now out,* hot on the heels of the Gartner Magic Quadrant. Unfortunately, this particular Forrester Wave is riddled with inaccuracy. Read more
Categories: Analytic technologies, Columnar database management, Data warehousing, EMC, Exadata, Greenplum, Netezza, Oracle, Pricing, SAP AG, Sybase, Teradata, Vertica Systems | 8 Comments |
Columnar compression vs. column storage
I’m getting the increasing impression that certain industry observers, such as Gartner, are really confused about columnar technology. (I further suspect that certain vendors are encouraging this confusion, as vendors commonly do.) So here are some basic points.
A simple way to think about the difference between columnar storage and columnar (or any other kind of) compression is this:
- Columnar storage is a reference to how data is grouped together on disk (or in solid-state memory).
- (Columnar) compression is a reference to whether the actual data is on disk, or whether you save space by storing some smaller substitute for the actual data.
Specifically, if data in a relational table is grouped together according to what row it’s in, then the database manager is called “row-based” or a “row store.” If it’s grouped together according to what column it’s in, then the database management system is called “columnar” or a “column store.” Increasingly, row-based and columnar storage are being hybridized.
There are two main kinds of compression — compression of bit strings and more intelligent compression of actual data values. Compression of actual data values can reasonably be called “columnar,” in that different columns of data can be compressed in different ways, often depending only on the data in that column.* Read more
Categories: Columnar database management, Data warehousing, Database compression, Exadata, Vertica Systems | 21 Comments |
Comments on the Gartner 2010/2011 Data Warehouse Database Management Systems Magic Quadrant
Edit: Comments on the February, 2012 Gartner Magic Quadrant for Data Warehouse Database Management Systems — and on the companies reviewed in it — are now up.
The Gartner 2010 Data Warehouse Database Management Systems Magic Quadrant is out. I shall now comment, just as I did to varying degrees on the 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants.
Note: Links to Gartner Magic Quadrants tend to be unstable. Please alert me if any problems arise; I’ll edit accordingly.
In my comments on the 2008 Gartner Data Warehouse Database Management Systems Magic Quadrant, I observed that Gartner’s “completeness of vision” scores were generally pretty reasonable, but their “ability to execute” rankings were somewhat bizarre; the same remains true this year. For example, Gartner ranks Ingres higher by that metric than Vertica, Aster Data, ParAccel, or Infobright. Yet each of those companies is growing nicely and delivering products that meet serious cutting-edge analytic DBMS needs, neither of which has been true of Ingres since about 1987. Read more
ParAccel PADB technical notes
I posted last October about PADB (ParAccel Analytic DataBase), but held back on various topics since PADB 3.0 was still under NDA. By the time PADB 3.0 was released, I was on blogging hiatus. Let’s do a bit of ParAccel catch-up now.
One big part of PADB 3.0 was an analytics extensibility framework. If we match PADB against my recent analytic computing system checklist, Read more
Categories: Analytic technologies, Data warehousing, EMC, MapReduce, ParAccel, Parallelization, Storage | 2 Comments |
Do we still need EDWs?
Colin White reopened the question of whether enterprise data warehouses (EDW) are still needed, lining up and knocking down a number of traditional pro-EDW arguments, in more detail than I ever have. So this feels like a good time to revisit my answer to the question of the EDW’s role, whose money quote was:
At conventional enterprises … Manage some of your data to enterprise data warehouse standards, but not all of it. Specifically, your highest-value data should be in something that looks like a classic enterprise data warehouse, and your lower-value data shouldn’t.
For sufficiently small enterprises, the “something that looks like a classic enterprise data warehouse” might just be your One Central Database, combining OLTP (OnLine Transaction Processing) and analytics. Otherwise, the chances are high that you’re going to want to copy your data crown jewels to an EDW, even if they’re also being used as analytic inputs directly from the OLTP systems that first capture them.
As I’ve recently reviewed, there are huge amounts of specialized technology for SQL queries and other analytics. Classical EDW vendors may not be the best or lowest-cost providers of such technology. And even when the EDW is technically competitive, the bureaucratic processes around it can impede rapid adoption of important analytic tools. So Colin is directionally right, in that most large enterprises should be taking the EDW concept less seriously than they currently do. But core EDW technology and business attitudes shouldn’t be entirely discarded either.
Categories: Analytic technologies, Data warehousing | 3 Comments |
Notes, links, and comments January 20, 2011
I haven’t done a pure notes/links/comments post for a while. Let’s fix that now. (A bunch of saved-up links, however, did find their way into my recent privacy threats overview.)
First and foremost, the fourth annual New England Database Summit (nee “Day”) is next week, specifically Friday, January 28. As per my posts in previous years, I think well of the event, which has a friendly, gathering-of-the-clan flavor. Registration is free, but the organizers would prefer that you register online by the end of this week, if you would be so kind.
The two things potentially wrong with the New England Database Summit are parking and the rush hour drive home afterwards. I would listen with interest to any suggestions about dinner plans.
One thing I hope to figure out at the Summit or before is what the hell is going on on Vertica’s blog or, for that matter, at Vertica. The recent Mike Stonebraker post that spawned a lot of discussion and commentary has disappeared. Meanwhile, Vertica has had three consecutive heads of marketing leave the company since June, and I don’t know who to talk to there any more. Read more
Categories: About this blog, Analytic technologies, Data warehousing, GIS and geospatial, Investment research and trading, MongoDB, OLTP, Open source, PostgreSQL, Vertica Systems | 4 Comments |
Sound bites on HP/Microsoft and Neoview
HP and Microsoft put out a press release. Three new appliances are being announced, and we’re being reminded of at least one past announcement. I wasn’t briefed, and wouldn’t want to comment on, say, price/performance or feature particulars. That said:
- HP Neoview seems pretty dead.
- I haven’t heard a single favorable reference to HP Neoview since I remarked in March, 2010 that “HP Neoview is reeling.”
- A reporter asked me “What went wrong?” Well, almost any new analytic DBMS/appliance product will compete mainly on two things in its early days — price/performance (or absolute performance), and just how (im)mature it initially is. (Aster Data may be the only prominent exception to that rule.) Presumably, HP Neoview did badly by those metrics.
- HP Neoview was widely conjectured to be a pet project of ousted former HP CEO Mark Hurd.
- Nobody tells me of competing with Microsoft SQL Server 2008 Parallel Data Warehouse either (i.e. Madison/DATallegro). Thus, in particular, I haven’t heard any reason to believe there’s anything good about the technology, especially now that the ever-upbeat Stuart Frost has left Microsoft. I’m conjecturing that Parallel Data Warehouse is focused heavily on the existing Microsoft installed base.
- Speaking of Aster — even under NDA, they won’t tell me or give me any useful hints as to who their undisclosed strategic investor is. Well, HP has a long history of investing in sometimes-competing DBMS vendors (back to Oracle and Informix), and a good reason to keep quiet (reluctance to admit the end of Neoview). Hmm …
- The consolidation appliance in the HP/Microsoft announcement is a clear response to Oracle’s Exadata strategy, or (which is probably more accurate) to the same market opportunity Oracle identified.
- I couldn’t quite figure out whether the cheap data warehouse appliance included Microsoft PowerPivot support, but that would make sense if it did.