Data warehousing

Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:

November 5, 2012

Do you need an analytic RDBMS?

I can think of seven major reasons not to use an analytic RDBMS. One is good; but the other six seem pretty questionable, niche circumstances excepted, especially at this time.

The good reason to not have an analytic RDBMS is that most organizations can run perfectly well on some combination of:

SaaS (Software as a Service).
A low-volume static website.
A network focused on office software.
A single cheap server, likely running a single instance of a general-purpose RDBMS.

Those enterprises, however, are generally not who I write for or about.

The six bad reasons to not have an analytic RDBMS all take the form “Can’t some other technology do the job better?”, namely:

A data warehouse that’s just another instance of your OLTP (OnLine Transaction Processing) RDBMS. If your problem is that big, it’s likely that a specialized analytic RDBMS will be more cost-effective and generally easier to deal with.
MOLAP (Multi-Dimensional OnLine Analytic Processing). That ship has sailed … and foundered … and been towed to drydock.
In-memory BI. QlikView, SAP HANA, Oracle Exalytics, and Platfora are just four examples of many. But few enterprises will want to confine their analytics to such data as fits affordably in RAM.
Non-tabular* approaches to investigative analytics. There are many examples in the Hadoop world — including the recent wave of SQL add-ons to Hadoop — and some in the graph area as well. But those choices will rarely suffice for the whole job, as most enterprises will want better analytic SQL performance for (big) parts of their workloads.
Tighter integration of analytics and OLTP (OnLine Transaction Processing). Workday worklets illustrate that business intelligence/OLTP integration is a really good idea. And it’s an idea that Oracle and SAP can be expected to push heavily, when they finally get their product acts together. But again, that’s hardly all the analytics you’re going to want to do.
Tighter integration of analytics and other short-request processing. An example would be maintaining a casual game’s leaderboard via a NoSQL write-optimized database. Yet again, that’s hardly all the analytics a typical enterprise will want to do.

Categories: Business intelligence, Data warehousing, Games and virtual worlds, Hadoop, In-memory DBMS, MOLAP

12 Comments

November 1, 2012

Introduction to Platfora

When I wrote last week that I have at least 5 clients claiming they’re uniquely positioned to support BI over Hadoop (most of whom partner with a 6th client, Tableau) the non-partnering exception I had in mind was Platfora, Ben Werther’s oh-so-stealthy startup that is finally de-stealthing today. Platfora combines:

An interesting approach to analytic data management.
Business intelligence tools integrated with that.

The whole thing sounds like a perhaps more general and certainly non-SaaS version of what Metamarkets has been offering for a while.

The Platfora technical story starts: Read more

Categories: Business intelligence, Columnar database management, Data models and architecture, Data warehousing, Database compression, Hadoop, Memory-centric data management, Platfora

6 Comments

October 17, 2012

Notes on analytic hardware

I took the opportunity of Teradata’s Aster/Hadoop appliance announcement to catch up with Teradata hardware chief Carson Schmidt. I love talking with Carson, about both general design philosophy and his views on specific hardware component technologies.

From a hardware-requirements standpoint, Carson seems to view Aster and Hadoop as more similar to each other than either is to, say, a Teradata Active Data Warehouse. In particular, for Aster and Hadoop:

I/O is more sequential.
The CPU:I/O ratio is higher.
Uptime is a little less crucial.

The most obvious implication is differences in the choice of parts, and of their ratio. Also, in the new Aster/Hadoop appliance, Carson is content to skate by with RAID 5 rather than RAID 1.

I think Carson’s views about flash memory can be reasonably summarized as: Read more

Categories: Aster Data, Data warehouse appliances, Data warehousing, Hadoop, Solid-state memory, Storage, Teradata

2 Comments

October 16, 2012

Hadapt Version 2

My clients at Hadapt are coming out with a Version 2 to be available in Q1 2013, and perhaps slipstreaming some of the features before then. At that point, it will be reasonable to regard Hadapt as offering:

A very tight integration between an RDBMS-based analytic platform and Hadoop …
… that is decidedly immature as an analytic RDBMS …
… but which strongly improves the SQL capabilities of Hadoop (vs., say, the alternative of using Hive).

Solr is in the mix as well.

Hadapt+Hadoop is positioned much more as “better than Hadoop” than “a better scale-out RDBMS”– and rightly so, due to its limitations when viewed strictly from an analytic RDBMS standpoint. I.e., Hadapt is meant for enterprises that want to do several of:

Dump multi-structured data into Hadoop.
Refine or just move some of it into an RDBMS.
Bring in data from other RDBMS.
Process of all the above via Hadoop MapReduce.
Process of all the above via SQL.
Use full-text indexes on the data.

Hadapt has 6 or so production customers, a dozen or so more coming online soon, 35 or so employees (mainly in Cambridge or Poland), reasonable amounts of venture capital, and the involvement of a variety of industry luminaries. Hadapt’s biggest installation seems to have 10s of terabytes of relational data and 100s of TBs of multi-structured; Hadapt is very confident in its ability to scale an order of magnitude beyond that with the Version 2 product, and reasonably confident it could go even further.

At the highest level, Hadapt works like this: Read more

Categories: Analytic technologies, Cloudera, Columnar database management, Data models and architecture, Data warehousing, Hadapt, Hadoop, MapR, MapReduce, Market share and customer counts, SQL/Hadoop integration, Text

4 Comments

October 1, 2012

Notes on the Oracle OpenWorld Sunday keynote

I’m not at Oracle OpenWorld, but as usual that won’t keep me from commenting. My bottom line on the first night’s announcements is:

At many large enterprises, Oracle has a lock on much of their IT efforts. (But not necessarily in the internet or investigative analytics areas.) Tonight’s announcements serve to strengthen that.
Tonight’s announcements do little to help Oracle in other market segments.

In particular:

1. At the highest level, my view of Oracle’s strategy is the same as it’s been for several years:

Clayton Christensen’s The Innovator’s Solution teaches us that Oracle should focus on selling a thick stack of technology to its highest-end customers, and that’s exactly what Oracle does focus on.

2. Tonight’s news is closely in line with what Oracle’s Juan Loaiza told me three years ago, especially:

Oracle thinks flash memory is the most important hardware technology of the decade, one that could lead to Oracle being “bumped off” if they don’t get it right.

Juan believes the “bulk” of Oracle’s business will move over to Exadata-like technology over the next 5-10 years. Numbers-wise, this seems to be based more on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database management tasks.

3. Oracle is confusing people with its comments on multi-tenancy. I suspect:

What Oracle is talking about when it says “multi-tenancy” is more like consolidation than true multi-tenancy.
Probably there are a couple of true multi-tenancy features as well.

4. SaaS (Software as a Service) vendors don’t want to use Oracle, because they don’t want to pay for it.* This limits the potential impact of Oracle’s true multi-tenancy features. Even so: Read more

Categories: Business intelligence, Cloud computing, Columnar database management, Data warehouse appliances, Data warehousing, Exadata, Memory-centric data management, Oracle, Software as a Service (SaaS), Solid-state memory, Storage

9 Comments

September 27, 2012

Hoping for true columnar storage in Oracle12c

I was asked to clarify one of my July comments on Oracle12c,

I wonder whether Oracle will finally introduce a true columnar storage option, a year behind Teradata. That would be the obvious enhancement on the data warehousing side, if they can pull it off. If they can’t, it’s a damning commentary on the core Oracle codebase.

by somebody smart who however seemed to have half-forgotten my post comparing (hybrid) columnar compression to (hybrid) columnar storage.

In simplest terms:

Columnar storage and columnar compression are two different things. The main connections are:
- Columnar storage can make columnar compression more effective.
- In different ways, both technologies reduce I/O.
EMC Greenplum, Teradata Aster, and Teradata Classic are all originally row-based systems that have gone hybrid columnar.
Vertica is an originally column-based system that has gone hybrid columnar.

Categories: Aster Data, Columnar database management, Data warehousing, Database compression, Greenplum, Oracle, Teradata, Vertica Systems

4 Comments

September 26, 2012

When should analytics be in-memory?

I was asked today for rules or guidance regarding “analytical problems, situations, or techniques better suited for in-database versus in-memory processing”. There are actually two kinds of distinction to be drawn:

Some workloads, in principle, should run on data to which there’s very fast and unfettered access — so fast and unfettered that you’d love the whole data set to be in RAM. For others, there is little such need.
Some products, in practice, are coupled to specific in-memory data stores or to specific DBMS, even though other similar products don’t make the same storage assumptions.

Let’s focus on the first part of that — what work, in principle, should be done in memory? Read more

Categories: Business intelligence, Data warehousing, Memory-centric data management, Parallelization, Predictive modeling and advanced analytics

2 Comments

September 24, 2012

Notes on Hadoop adoption

I successfully resisted telephone consulting while on vacation, but I did do some by email. One was on the oft-recurring subject of Hadoop adoption. I think it’s OK to adapt some of that into a post.

Notes on past and current Hadoop adoption include:

Enterprise Hadoop adoption is for experimental uses or departmental production (as opposed to serious enterprise-level production). Indeed, it’s rather tough to disambiguate those two. If an enterprise uses Hadoop to search for new insights and gets a few, is that an experiment that went well, or is it production?
One of the core internet-business use cases for Hadoop is a many-step ETL, ELT, and data refinement pipeline, with Hadoop executing some or many of the steps. But I don’t think that’s in production at many enterprises yet, except in the usual forward-leaning sectors of financial services and (we’re all guessing) national intelligence.
In terms of industry adoption:
- Financial services on the investment/trading side are all over Hadoop, just as they’re all over any technology. Ditto national intelligence, one thinks.
- Consumer financial services, especially credit card, are giving Hadoop a try too, for marketing and/or anti-fraud.
- I’m sure there’s some telecom usage, but I’m hearing of less than I thought I would. Perhaps this is because telcos have spent so long optimizing their data into short, structured records.
- Whatever consumer financial services firms do, retailers do too, albeit with smaller budgets.

Thoughts on how Hadoop adoption will look going forward include: Read more

Categories: Cloud computing, Data warehouse appliances, Data warehousing, EAI, EII, ETL, ELT, ETLT, Hadoop, Investment research and trading, Telecommunications

3 Comments

September 7, 2012

Integrated internet system design

What are the central challenges in internet system design? We probably all have similar lists, comprising issues such as scale, scale-out, throughput, availability, security, programming ease, UI, or general cost-effectiveness. Screw those up, and you don’t have an internet business.

Much new technology addresses those challenges, with considerable success. But the success is usually one silo at a time — a short-request application here, an analytic database there. When it comes to integration, unsolved problems abound.

The top integration and integration-like challenges for me, from a practical standpoint, are:

Integrating silos — a decades-old problem still with us in a big way.
Dynamic schemas with joins.
Low-latency business intelligence.
Human real-time personalization.

Other concerns that get mentioned include:

Geographical distribution due to privacy laws, which for some users is a major requirement for compliance.
Logical data warehouse, a term that doesn’t actually mean anything real.
In-memory data grids, which some day may no longer always be hand-coupled to the application and data stacks they accelerate.

Let’s skip those latter issues for now, focusing instead on the first four.

Categories: About this blog, Business intelligence, Cache, Clustering, Data integration and middleware, Data warehousing, Database diversity, EAI, EII, ETL, ELT, ETLT, Exadata, NoSQL, OLTP, Oracle, Predictive modeling and advanced analytics, SAP AG, Surveillance and privacy

6 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in