Data warehousing

Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:

March 9, 2012

Hardware and components — lessons from Teradata

I love talking with Carson Schmidt, chief of Teradata’s hardware engineering (among other things), even if I don’t always understand the details of what he’s talking about. It had been way too long since our last chat, so I requested another one. We were joined by Keith Muller, who I presume is pictured here. Takeaways included:

Read more

February 26, 2012

SAP HANA today

SAP HANA has gotten much attention, mainly for its potential. I finally got briefed on HANA a few weeks ago. While we didn’t have time for all that much detail, it still might be interesting to talk about where SAP HANA stands today.

The HANA section of SAP’s website is a confusing and sometimes inaccurate mess. But an IBM whitepaper on SAP HANA gives some helpful background.

SAP HANA is positioned as an “appliance”. So far as I can tell, that really means it’s a software product for which there are a variety of emphatically-recommended hardware configurations — Intel-only, from what right now are eight usual-suspect hardware partners. Anyhow, the core of SAP HANA is an in-memory DBMS. Particulars include:

SAP says that the row-store part is based both on P*Time, an acquisition from Korea some time ago, and also on SAP’s own MaxDB. The IBM white paper mentions only the MaxDB aspect. (Edit: Actually, see the comment thread below.) Based on a variety of clues, I conjecture that this was an aspect of SAP HANA development that did not go entirely smoothly.

Other SAP HANA components include:  Read more

February 8, 2012

Comments on SAS

A reporter interviewed me via IM about how CIOs should view SAS Institute and its products. Naturally, I have edited my comments (lightly) into a blog post. They turned out to be clustered into three groups, as follows:

February 8, 2012

Comments on the analytic DBMS industry and Gartner’s Magic Quadrant for same

This year’s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the 2010, 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying:

*As of February, 2012 — and surely for many months thereafter — Teradata is graciously paying for a link to the report.

Specific company comments, roughly in line with Gartner’s rough single-dimensional rank ordering, include: Read more

February 6, 2012

WibiData, derived data, and analytic schema flexibility

My clients at Odiago, vendors of WibiData, have changed their company name simply to WibiData. Even better, they blogged with more detail as to how WibiData works, in what is essentially a follow-on to my original WibiData post last October. Among other virtues, WibiData turns out to be a poster child for my views on derived data and the corresponding schema evolution.

Interesting quotes include:

WibiData is designed to store … transactional data side-by-side with profile and other derived data attributes.

… the ability to add new ad-hoc columns to a table enables more flexible analysis: output data that is the result of one analytic pipeline is stored adjacent to its input data, meaning that you can easily use this as input to second- or third-order derived data as well.

schemas can vary over time; you can easily add a field to a record, or delete a field. … But even though you start collecting that new data, your existing analysis pipelines can treat records like they always did; programs that don’t yet know about the new cookie are still compatible with both the old records already collected, and the new records with the additional field. New programs fill in default values for old data recorded before a field was added, applying the new schema at read time.

schemas for every column are stored in a data dictionary that matches column names with their schemas, as well as human-readable descriptions of the data.

Interesting aspects of the post that don’t lend themselves as well to being excerpted include:

February 6, 2012

Comments on the 2012 Forrester Wave: Enterprise Hadoop Solutions

Forrester has released its Q1 2012 Forrester Wave: Enterprise Hadoop Solutions. (Googling turns up a direct link, but in case that doesn’t prove stable, here also is a registration-required link from IBM’s Conor O’Mahony.) My comments include:

January 25, 2012

Departmental analytics — best practices

I believe IT departments should support and encourage departmental analytics efforts, where “support” and “encourage” are not synonyms for “control”, “dominate”, “overwhelm”, or even “tame”. A big part of that is:
Let, and indeed help, departments have the data they want, when they want it, served with blazing performance.

Three things that absolutely should NOT be obstacles to these ends are:

Read more

January 24, 2012

Microsoft SQL Server 2012 and enterprise database choices in general

Microsoft is launching SQL Server 2012 on March 7. An IM chat with a reporter resulted, and went something like this.

Reporter: [Care to comment]?
CAM: SQL Server is an adequate product if you don’t mind being locked into the Microsoft stack. For example, the ColumnStore feature is very partial, given that it can’t be updated; but Oracle doesn’t have columnar storage at all.

Reporter: Is the lock-in overall worse than IBM DB2, Oracle?
CAM: Microsoft locks you into an operating system, so yes.

Reporter: Is this release something larger Oracle or IBM shops could consider as a lower-cost alternative a co-habitation scenario, in the event they’re mulling whether to buy more Oracle or IBM licenses?
CAM: If they have a strong Microsoft-stack investment already, sure. Otherwise, why?

Reporter: [How about] just cost?
CAM: DB2 works just as well to keep Oracle honest as SQL Server does, and without a major operating system commitment. For analytic databases you want an analytic DBMS or appliance anyway.

Best is to have one major vendor of OTLP/general-purpose DBMS, a web DBMS, a DBMS for disposable projects (that may be the same as one of the first two), plus however many different analytic data stores you need to get the job done.

By “web DBMS” I mean MySQL, NewSQL, or NoSQL. Actually, you might need more than one product in that area.

January 23, 2012

Departmental analytics — general observations

Department-level adoption of analytic technology isn’t the exception; it’s the norm. Reasons include:

That said, arguments for centralizing analytic technology include:

What’s more, there are IT best practices to support department-level analytics. Some of the key ones boil down to:

My conclusion is that central IT should encourage (and aid) departmental analytics. Let’s look at some details.

Read more

January 10, 2012

Splunk update

Splunk is announcing the Splunk 4.3 point release. Before discussing it, let’s recall a few things about Splunk, starting with:

As in any release, a lot of Splunk 4.3 is about “Oh, you didn’t have that before?” features and Bottleneck Whack-A-Mole performance speed-up. One performance enhancement is Bloom filters, which are a very hot topic these days. More important is a switch from Flash to HTML5, so as to accommodate mobile devices with less server-side rendering. Splunk reports that its users — especially the non-IT ones — really want to get Splunk information on the tablet devices. While this somewhat contradicts what I wrote a few days ago pooh-poohing mobile BI, let me hasten to point out:

That’s pretty much the ideal scenario for mobile BI: Timeliness matters and prettiness doesn’t.

Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.