Key questions when selecting an analytic RDBMS
I recently complained that the Gartner Magic Quadrant for Data Warehouse DBMS conflates many use cases into one set of rankings. So perhaps now would be a good time to offer some thoughts on how to tell use cases apart. Assuming you know that you really want to manage your analytic database with a relational DBMS, the first questions you ask yourself could be:
- How big is your database? How big is your budget?
- How do you feel about appliances?
- How do you feel about the cloud?
- What are the size and shape of your workload?
- How fresh does the data need to be?
Let’s drill down.
How big is your database? How big is your budget?
Taken together, these questions tell you which choices are even feasible. Does your database fit into RAM, at a price you can afford? Does it fit onto a single, perhaps large, server? If both answers are “No”, then you need a real scale-out system, querying disk or flash (which itself could be hard to afford). Otherwise, you have more options.
Note that database compression has a big influence on what fits where.
How do you feel about appliances?
Depending on considerations such as database size, the choice of Oracle, Teradata, IBM Netezza, or Microsoft SQL Server may mandate or at least strongly suggest an appliance form factor. For most other analytic DBMS, an appliance is more optional. Are appliances good for you? Bad? Indifferent? Trade-offs include:
- Appliances often involve paying a premium for hardware purchase and/or support.
- Appliances often are easy(ier) to install and manage.
- Appliances are easier to upgrade in some ways (everything’s integrated), but harder in others (less ability to upgrade bottlenecked parts).
- Appliances often don’t play well in the cloud.
How do you feel about the cloud?
Analytic DBMS run better on good hardware and predictable bandwidth (hence all those appliances). These can be hard to find in the cloud. So, not coincidentally, can be analytic DBMS references, although most vendors can muster a few.
If you feel you need to run your analytic RDBMS in the cloud now, check references carefully. If you only are concerned about the cloud as some indefinite future, then you might want to rule out a few appliance-only vendors, but otherwise you probably shouldn’t worry. Cloud hardware and networking are getting better, and RDBMS software vendors are gaining experience in cloud deployments.
What are the size and shape of your workload?
Different analytic databases can have very different kinds of workloads. Tasks include:
- Complex, long-running queries.
- Repetitive reports of varying degrees of complexity.
- Simple queries.
- Large, scheduled loads.
- Continuous or near-continuous/micro-batch loads.
The big issue is — how many of each kind of task need to performed concurrently, and in what combinations? If you’re refreshing 10,000 dashboards, several hundred of which might be getting drill-down queries at once, while trying to do a few scan-heavy queries in the background and some 15-way joins, most analytic DBMS might disappoint you. (Indeed, I’d ask whether you might want to split up that work among two or more systems.) Different DBMS — and different hardware/storage/networking configurations — shine in different scenarios.
How fresh does the data need to be?
Any serious analytic DBMS can be loaded daily or hourly, edge cases perhaps excepted. In most cases 15 minute intervals work as well, or even 5, but check whether those load latencies would interfere with any performance optimizations. But if you want sub-second data freshness, or even several-second — well, that has to be a top-tier architectural issue.
If your analytics are simple enough, it’s appealing to do the immediate-response ones straight from your transactional database. If not, you may need some kind of streaming-replication setup. Usually, I wind up recommending replication approaches that don’t yet have a lot of maturity or references. Tread carefully here.
Related links
- Eight kinds of analytic database (July 2011, 2-part post)
- How to select a data warehouse DBMS (February 2009, slide deck)
Comments
3 Responses to “Key questions when selecting an analytic RDBMS”
Leave a Reply
I’ve done this selection many times, and what seems to work is to drill into what the goals are (operational BI for 1,000 concurrent users will look different than data mining.) Once you have a sense of that, the next questions are around goal states – is it for more reporting (most BI turns into static reports or *OLAP in a stable business – even with new capabilities), more for deep analysis, or a mix.
Based on these, come up with a capabilities requirements matrix and do the choosing. Obviously , this will be way off, so get rid of goofy req.s and then make sure you have 10X headroom.
Most of the capabilities for these things are getting commoditized, so the vendor makes some difference:
– if they/their product are less stable
– if they don’t integrate well with other tools used
– if they can do things that are unique
– cost (mostly license, sometimes operational effort, rarely HW)
It is shocking how few times performance bake offs are useful. They are done, generally to paper stamp a management position, rather than based on technical merits. It is very rare that the test cases have any relevance to target state. Typical tests are a few bulk loads and a few queries – and most results relate to time/logistic factors rather than intrinsic capability.
What it typically comes down to is what works within the customer stack. Approved software? Backups? Tend to drive the bulk of new projects.
As you point out, Curt, the most critical issue is often the freshness of data. Sometimes this is an architectural issue (e.g., ODS+DW.) and sometimes it is part of the analytics DB; in most cases smaller vendors lose if this point is critical.
[…] Key questions when selecting an analytic RDBMS: Assuming you know that you really want to manage your analytic database with a relational DBMS, the first questions you ask yourself could be: […]
[…] Analytic database distinctions (February, 2013) […]