February 6, 2009
Final (for now) slides on how to select a data warehouse DBMS
I’ve now posted a final version of the slide deck* I first posted Wednesday. And I do mean final; TDWI likes its slide decks locked down weeks in advance, because they go to the printer to be memorialized on dead trees. I added or fleshed out notes on quite a few slides vs. the prior draft. Actual changes to the slides themselves, however, were pretty sparse, and mainly were based on comments to the prior post. Thanks for all the help!
*That’s a new URL. The old deck is still up too, for those morbidly curious as to what I did or didn’t change.
Comments
14 Responses to “Final (for now) slides on how to select a data warehouse DBMS”
Leave a Reply
[…] The slides have now been finalized. Share: These icons link to social bookmarking sites where readers can share and discover new web […]
Hi, Curt — Did you change the name of this powerpoint presentation? Is it how to select an analytic DBMS or a data warehouse DBMS. Are they interchangeable? — Hellena
The title is what it is on the first page of the presentation. 🙂
But I tend to use the terms “analytic DBMS” and “data warehouse DBMS” pretty interchangeably, especially when focusing — as I usually do — on terabyte+ scales.
Hi Curt,
It’s great you put in MonetDB in the example columnar DBMS examples. I missed LucidDB though, that’s the other open source columnar contender. If you’re interested, there are some bechmarks (and references to others) on my website; I’m currently working on a TPC-H sf 10/100 comparison between MonetDB, LucidDB and Infobright.
I can’t get everything in there.
MonetDB comes up a bit more than LucidDB in my life, although the sample size is so small I wouldn’t attach much importance to that.
Hi
Your presentation is really brilliant and specially usefull for me that sometimes I’m in the vendor side and others in the customer side.
I would add two important tests:
1) Recovery behavior: test how DB recover from network microcuts or other unwanted services interrumption scenarios. In standard OLTP databases recovery process could last hours or days till transaction log is fully roll backed. I don’t know who much memory centric DBs or column based ones, lasts to recover after restarting.
2) Interoperability with other DBs: it’s usual that new technologies are not deployed solely within the current customer architecture, as different steps of migration projects could obligue to face a temporal scenario of distributed queries among different technologies. But connecivity tests are not enough to assure that products can interoperate: real queries must be tested so as to get sure connectivity is stable enough. I’ve got very surprised when I’ve found that sometimes client providers failed solving queries with certain complexity.
Thanks
Leandro,
Thanks for commenting, especially with the nice compliments!
#1 sounds like the “baseball bat test” I talk about.
#2 I’m inclined to disagree with. I think federated queries are evil, and would rather see data recopied in most scenarios.
[…] in less than an hour. So the latest version of my slide deck should prove truly final, unlike my prior […]
[…] posted several stages of my thinking in connection with a February presentation on how to buy an analytic DBMS. The whole process seemed like a success, with good input early on, and at least one new client […]
On your slide: “General areas of differentiation” you missed one very important one – “Out of the box performance”. The simplicity side of appliances is sorely lacking in some architectures and this is what drives agility. The ability to support totally unpredictable queries with no indexes required. Netezza is the only vendor to date that doesn’t require indexes and requires minimal configuration.
Greenplum is a Columnar DBMS, it also support row-base table.
Are they saying that now when selling? It’s not really true.
http://www.dbms2.com/2009/10/14/greenplum-hybrid-columnar/ has more.
Hi Curt, great presentation. I am actually trying to figure out which one is better for telecommunications data? Currently I have installed Infobright but now i hear that even Vertica is better. Its confusing.
Tariq,
It would depend on your use case. Also, a number of these products are very good.
Vertica’s two clearest advantages over Infobright are that it scales out further and that it has more analytic capabilities, especially via embedded analytics. If neither of those matters greatly to you, and you’re getting good performance from Infobright, why would you switch?