Columnar database management

Analysis of products and issues in column-oriented database management systems. Related subjects include:

July 17, 2012

Why I recommend avoiding Kognitio

Since my recent post about Kognitio, things have gotten worse. The company is insistently pushing the marketing message that Kognitio has always been an in-memory product, and at one point went so far as to publicly pretend that I had agreed.

I do not agree. Yes, it’s fair to say — as I did in 2008 — that Kognitio is very RAM-centric, but that’s not at all the same thing. In particular:

The truth is that Kognitio offers a disk-based DBMS that has long been worked on by a small team. I believe that the team really has put considerable effort into how Kognitio uses RAM. But there’s no basis to give Kognitio credit for being “really” in-memory vs. a variety of other analytic RDBMS alternatives. And a row-based product that doesn’t currently offer compression is at a large disadvantage versus, say, columnar products that already do.*

*Columnar systems don’t clobber row-based ones in-memory as extremely as they do in some disk-based use cases. But even in-memory it’s good not to have to move around data that isn’t relevant to your query.

Until Kognitio gets at least somewhat more honest in its marketing, I recommend avoiding Kognitio like the plague. It’s simply not a big enough company to buy from unless you have some level of trust in the management team.

June 16, 2012

Metamarkets’ back-end technology

This is part of a three-post series:

The canonical Metamarkets batch ingest pipeline is a bit complicated.

By “get data read to be put into Druid” I mean:

That metadata is what goes into the MySQL database, which also retains data about shards that have been invalidated. (That part is needed because of the MVCC.)

By “build the data segments” I mean:

When things are being done that way, Druid may be regarded as comprising three kinds of servers: Read more

June 16, 2012

Metamarkets Druid overview

This is part of a three-post series:

My clients at Metamarkets are planning to open source part of their technology, called Druid, which is described in the Druid section of Metamarkets’ blog. The timing of when this will happen is a bit unclear; I know the target date under NDA, but it’s not set in stone. But if you care, you can probably contact the company to get involved earlier than the official unveiling.

I imagine that open-source Druid will be pretty bare-bones in its early days. Code was first checked in early in 2011, and Druid seems to have averaged around 1 full-time developer since then. What’s more, it’s not obvious that all the features I’m citing here will be open-sourced; indeed, some of the ones I’m describing probably won’t be.

In essence, Druid is a distributed analytic DBMS. Druid’s design choices are best understood when you recall that it was invented to support Metamarkets’ large-scale, RAM-speed, internet marketing/personalization SaaS (Software as a Service) offering. In particular:

Interestingly, the single-table/multi-valued choice is echoed at WibiData, which deals with similar data sets. However, WibiData’s use cases are different from Metamarkets’, and in most respects the WibiData architecture is quite different from that of Metamarkets/Druid.

Read more

April 7, 2012

Many kinds of memory-centric data management

I’m frequently asked to generalize in some way about in-memory or memory-centric data management. I can start:

Getting more specific than that is hard, however, because:

Consider, for example, some of the in-memory data management ideas kicking around. Read more

February 8, 2012

Comments on the analytic DBMS industry and Gartner’s Magic Quadrant for same

This year’s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the 2010, 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying:

*As of February, 2012 — and surely for many months thereafter — Teradata is graciously paying for a link to the report.

Specific company comments, roughly in line with Gartner’s rough single-dimensional rank ordering, include: Read more

November 21, 2011

Some big-vendor execution questions, and why they matter

When I drafted a list of key analytics-sector issues in honor of look-ahead season, the first item was “execution of various big vendors’ ambitious initiatives”. By “execute” I mean mainly:

Vendors mentioned here are Oracle, SAP, HP, and IBM. Anybody smaller got left out due to the length of this post. Among the bigger omissions were:

Read more

November 12, 2011

Clarifying SAND’s customer metrics, positioning and technical story

Talking with my clients at SAND can be confusing. That said:

A few months ago, I wrote:

SAND Technology reported >600 total customers, including >100 direct.

Upon talking with the company, I need to revise that figure downward, from > 600 to 15.

Read more

November 12, 2011

Exasol update

I last wrote about Exasol in 2008. After talking with the team Friday, I’m fixing that now. 🙂 The general theme was as you’d expect: Since last we talked, Exasol has added some new management, put some effort into sales and marketing, got some customers, kept enhancing the product and so on.

Top-level points included:

Read more

October 18, 2011

Oracle is buying Endeca

Oracle is buying Endeca. The official talking points for the deal aren’t a perfect match for Endeca’s actual technology, but so be it.

In that post, I wrote:

… the Endeca paradigm is really to help you make your way through a structured database, where different portions of the database have different structures. Thus, at various points in your journey, it automagically provides you a list of choices as to where you could go next.

That kind of thing could help Oracle with apps like the wireless telco product catalog deal MongoDB got.

Going back to the Endeca-post quote well, Endeca itself said:

Inside the MDEX Engine there is no overarching schema; each data record carries its own metadata. This enables the rapid combination of a wide range of structured and unstructured content into Latitude’s unified data model. Once inside, the MDEX Engine derives common dimensions and metrics from the available metadata, instantly exposing each for high-performance refinement and analysis in the Discovery Framework. Have a new data source? Simply add it and the MDEX Engine will create new relationships where possible. Changes in source data schema? No problem, adjustments on the fly are easy.

And I pointed out that the MDEX engine was a columnar DBMS.

Meanwhile, Oracle’s own columnar DBMS efforts have been disappointing. Endeca could be an intended answer to that. However, while Oracle’s track record with standalone DBMS acquisitions is admirable (DEC RDB, MySQL, etc.), Oracle’s track record of integrating DBMS acquisitions into the Oracle product itself is not so good. (Express? Essbase? The text product line? None of that has gone particularly well.)

So while I would expect Endeca’s flagship e-commerce shopping engine products to flourish under Oracle’s ownership, I would be cautious about the integration of Endeca’s core technology into the Oracle product line.

September 22, 2011

Hybrid-columnar soundbites

Busy couple of days talking with reporters. A few notes on hybrid-columnar analytic DBMS, all backed up by yesterday’s post on Teradata columnar:

Edit: The Wall Street Journal got this wrong, writing that Teradata was the first-ever hybrid columnar system. Specifically, they wrote

While columnar technology has been around for years, Teradata says its product is unique because it allows users to include both columns and rows in the same database.

Googling on “Teradata To Unveil New Analytics Product To Speed Business Adoption” might get you around the paywall to see the offending piece.

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.