Data warehousing

Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:

June 16, 2012

Metamarkets’ back-end technology

This is part of a three-post series:

The canonical Metamarkets batch ingest pipeline is a bit complicated.

By “get data read to be put into Druid” I mean:

That metadata is what goes into the MySQL database, which also retains data about shards that have been invalidated. (That part is needed because of the MVCC.)

By “build the data segments” I mean:

When things are being done that way, Druid may be regarded as comprising three kinds of servers: Read more

June 16, 2012

Metamarkets Druid overview

This is part of a three-post series:

My clients at Metamarkets are planning to open source part of their technology, called Druid, which is described in the Druid section of Metamarkets’ blog. The timing of when this will happen is a bit unclear; I know the target date under NDA, but it’s not set in stone. But if you care, you can probably contact the company to get involved earlier than the official unveiling.

I imagine that open-source Druid will be pretty bare-bones in its early days. Code was first checked in early in 2011, and Druid seems to have averaged around 1 full-time developer since then. What’s more, it’s not obvious that all the features I’m citing here will be open-sourced; indeed, some of the ones I’m describing probably won’t be.

In essence, Druid is a distributed analytic DBMS. Druid’s design choices are best understood when you recall that it was invented to support Metamarkets’ large-scale, RAM-speed, internet marketing/personalization SaaS (Software as a Service) offering. In particular:

Interestingly, the single-table/multi-valued choice is echoed at WibiData, which deals with similar data sets. However, WibiData’s use cases are different from Metamarkets’, and in most respects the WibiData architecture is quite different from that of Metamarkets/Druid.

Read more

June 16, 2012

Introduction to Metamarkets and Druid

I previously dropped a few hints about my clients at Metamarkets, mentioning that they:

But while they’re a joy to talk with, writing about Metamarkets has been frustrating, with many hours and pages of wasted of effort. Even so, I’m trying again, in a three-post series:

Much like Workday, Inc., Metamarkets is a SaaS (Software as a Service) company, with numerous tiers of servers and an affinity for doing things in RAM. That’s where most of the similarities end, however, as  Metamarkets is a much smaller company than Workday, doing very different things.

Metamarkets’ business is SaaS (Software as a Service) business intelligence, on large data sets, with low latency in both senses (fresh data can be queried on, and the queries happen at RAM speed). As you might imagine, Metamarkets is used by digital marketers and other kinds of internet companies, whose data typically wants to be in the cloud anyway. Approximate metrics for Metamarkets (and it may well have exceeded these by now) include 10 customers, 100,000 queries/day, 80 billion 100-byte events/month (before summarization), 20 employees, 1 popular CEO, and a metric ton of venture capital.

To understand how Metamarkets’ technology works, it probably helps to start by realizing: Read more

May 22, 2012

Kognitio’s story today

I had dinner tonight with the Kognitio folks. So far as I can tell:

Kognitio believes that this story is appealing, especially to smaller venture-capital-backed companies, and backs that up with some frieNDA pipeline figures.

Between that success claim and SAP’s HANA figures, it seems that the idea of using an in-memory DBMS to accelerate analytics has legs. This makes sense, as the BI vendors — Qlik Tech excepted — don’t seem to be accomplishing much with their proprietary in-memory alternatives. But I’m not sure that Kognitio would be my first choice to fill that role. Rather, if I wanted to buy an unsuccessful analytic RDBMS to use as an in-memory accelerator, I might consider ParAccel, which is columnar, has an associated compression story, has always had a hybrid memory-centric flavor much as Kognitio has, and is well ahead of Kognitio in the analytic platform derby. That said, I’ll confess to not having talked with or heard much about ParAccel for a while, so I don’t know if they’ve been able maintain technical momentum any more than Kognitio has.

May 17, 2012

Thoughts on “data science”

Teradata is paying me to join a panel on “data science” in downtown Boston, Tuesday May 22, at 3:00 pm. A planning phone call led me to jot down a few notes on the subject, which I’m herewith adapting into a blog post.

For starters, I have some concerns about the concepts of data science and data scientist. Too often, the term “data scientist” is used to suggest that one person needs to have strong skills both in analytics and in data management. But in reality, splitting those roles makes perfect sense. Further:

The leader in raising these issues is probably Neil Raden.

But there’s one respect in which I think the term “data science” is highly appropriate. In conventional science, gathering data is just as much of an accomplishment as analyzing it. Indeed, most Nobel Prizes are given for experimental results. Similarly, if you’re doing data science, you should be thinking hard about how to corral ever more useful data. Techniques include but are not limited to:

May 3, 2012

Big Data hype?

A reporter wrote in to ask whether investor interest in “Big Data” was justified or hype. (More precisely, that’s how I reinterpreted his questions. 🙂 ) His examples were Splunk’s IPO, Teradata’s stock price increase, and Birst’s financing. In a nutshell:

1. A great example of hype is that anybody is calling Birst a “Big Data” or “Big Data analytics” company. If anything, Birst is a “little data” analytics company that claims, as a differentiating feature, that it can handle ordinary-sized data sets as well. Read more

April 7, 2012

Many kinds of memory-centric data management

I’m frequently asked to generalize in some way about in-memory or memory-centric data management. I can start:

Getting more specific than that is hard, however, because:

Consider, for example, some of the in-memory data management ideas kicking around. Read more

April 5, 2012

Human real-time

I first became an analyst in 1981. And so I was around for the early days of the movement from batch to interactive computing, as exemplified by:

Of course, wherever there is interactive computing, there is a desire for interaction so fast that users don’t notice any wait time. Dan Fylstra, when he was pitching me the early windowing system VisiOn, characterized this as response so fast that the user didn’t tap his fingers waiting.* And so, with the move to any kind of interactive computing at all came a desire that the interaction be quick-response/low-latency. Read more

April 4, 2012

IBM DB2 10

Shortly before Tuesday’s launch of DB2 10, IBM’s Conor O’Mahony checked in for a relatively non-technical briefing.* More precisely, this is about DB2 for “distributed” systems, aka LUW (Linux/Unix/Windows); some of the features have already been in the mainframe version of DB2 for a while. IBM is graciously permitting me to post the associated DB2 10 announcement slide deck.

*I hope any errors in interpretation are minor.

Major aspects of DB2 10 include new or improved capabilities in the areas of:

Of course, there are various other enhancements too, including to security (fine-grained access control), Oracle compatibility, and DB2 pureScale. Everything except the pureScale part is also reflected in IBM InfoSphere Warehouse, which is a near-superset of DB2.*

*Also, the data ingest part isn’t in base DB2.

Read more

March 16, 2012

Juggling analytic databases

I’d like to survey a few related ideas:

Here goes. Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.