Columnar database management
Analysis of products and issues in column-oriented database management systems. Related subjects include:
MySQL storage engine round-up, with Oracle-related thoughts
Here’s what I know about MySQL storage engines, more or less.
- MySQL with MyISAM is fast. But it’s not transactional. Except for limited purposes, MySQL with MyISAM is a pretty crummy DBMS. Nothing can change that.
- MySQL with InnoDB is transactional. But it’s not particularly fast. MySQL with InnoDB is a pretty mediocre DBMS. Oracle could fix that, at least partially, over time.
- I don’t know much about Falcon, Maria, and so on. With Oracle winding up owning both MySQL and InnoDB, the motivation for those engines (except as Oracle-free forks) might fade.
- Infobright is the most established of the rest. At the moment I’m not recommending it for most industrial-strength uses unless the user is particularly cash-constrained. But I wouldn’t be surprised if that changed soon. A cheap, fast, simple columnar analytic DBMS has a place in the world.
- Kickfire is next in line, offering a hardware-based growth path for users who’ve maxed out on what unaided MySQL can do. It remains to be seen for how many users the desire to keep things simple and stay with MySQL outweighs the desire to avoid custom hardware. Having Oracle salespeople all over those accounts surely wouldn’t help. Kickfire also has a second market, namely OEM vendors who are mainly interested in the superfast chip. That would probably be pretty unaffected by Oracle.
- Tokutek offers a technical proposition that’s hard to match head-on without going the CEP route. Users who care are likely to be MySQL shops. Tokutek’s main challenge is to prove that it sufficiently outdoes competing technical strategies for sufficiently many users. Oracle ownership of MySQL seems pretty irrelevant to Tokutek’s success or failure.
- Calpont offers a kind of lightweight Exadata alternative. With Calpont’s packaging and positioning perennially unclear, it’s difficult to predict the effect of a particular change — i.e., Oracle buying MySQL — in Calpont’s market environment.
- I haven’t heard from transactionally-oriented ScaleDB since I wrote about them a year ago. Apparently, they’re rolling out beta product this week, and their venerable techie guru sadly passed away earlier this month.
Categories: Calpont, Columnar database management, Data warehousing, Exadata, Infobright, Kickfire, MySQL, Open source, Oracle, Tokutek and TokuDB | 14 Comments |
Calpont update — you read it here first!
Calpont has gone through a lot of strategy iterations since its founding. The super-short version is that Calpont originally planned an appliance built around a SQL chip, much like Kickfire. But after various changes in management and venture backing, Calpont turned itself into a software-only analytic DBMS vendor relying on a MySQL front end. Calpont is now at the stage of announcing an Early Adopter program at the MySQL conference on Wednesday, although details of Calpont’s product release timing, pricing, feature set, etc. are all To Be Determined.
Minor highlights of the Calpont technical story include: Read more
Categories: Calpont, Columnar database management, Data warehousing, MySQL, Open source, Parallelization, Theory and architecture | Leave a Comment |
Infobright update
For the past couple of quarters, Infobright has been MySQL’s partner of choice for larger data warehousing applications. Infobright’s stated business metrics, and I quote, include:
> 50 Customers in 7 Countries
> 25 Partners on 4 continents
A vibrant open source community
+1 million visitors
Approaching 10,000 downloads
2,000 active community participants
These may be compared with analogous metrics Infobright offered in February.
Infobright has also made or promised a variety of technological enhancements. Ones that are either shipping now or promised soon include: Read more
Categories: Columnar database management, Data warehousing, Infobright, MySQL, Open source | 6 Comments |
The questionable benefits of terabyte-scale data warehouse virtualization
Vertica is virtualizing via VMware, and has suggested a few operational benefits to doing so that might or might not offset VMware’s computational overhead. But on the whole,it seems virtualization’s major benefits don’t apply to the large-database MPP data warehousing. Read more
Categories: Columnar database management, Data warehousing, Database compression, Theory and architecture, Vertica Systems | 2 Comments |
Draft slides on how to select an analytic DBMS
I need to finalize an already-too-long slide deck on how to select an analytic DBMS by late Thursday night. Anybody see something I’m overlooking, or just plain got wrong?
Edit: The slides have now been finalized.
More grist for the column vs. row mill
Daniel Abadi and Sam Madden are at it again, following up on their blog posts of six months arguing for the general superiority of column stores over row stores (for analytic query processing). The gist is to recite a number of bases for superiority, beyond the two standard ones of less I/O and better compression, and seems to be based largely on Section 5 of a SIGMOD paper they wrote with Neil Hachem.
A big part of their argument is that if you carry the processing of columnar and/or compressed data all the way through in memory, you get lots of advantages, especially because everything’s smaller and hence fits better into Level 2 cache. There also is some kind of join algorithm enhancement, which seems to be based on noticing when the result wound up falling into a range according to some dimension, and perhaps using dictionary encoding in a way that will help induce such an outcome.
The main enemy here is row-store vendors who say, in effect, “Oh, it’s easy to shoehorn almost all the benefits of a column-store into a row-based system.” They also take a swipe — for being insufficiently purely columnar — at unnamed columnar Vertica competitors, described in terms that seemingly apply directly to ParAccel.
Categories: Columnar database management, Data warehousing, Database compression, ParAccel, Vertica Systems | 2 Comments |
Introduction to SAND Technology
SAND Technology has a confused history. For example:
- SAND has been around in some form or other since 1982, starting out as a Hitachi reseller in Canada.
- In 1992 SAND acquired a columnar DBMS product called Nucleus, which originally was integrated with hardware (in the form of a card). Notwithstanding what development chief Richard Grodin views as various advantages vs. Sybase IQ, SAND has only had limited success in that market.
- Thus, SAND introduced a second, similarly-named product, which could also be viewed as a columnar DBMS. (As best I can tell, both are called SAND/DNA.) But it’s actually focused on archiving, aka the clunkily named “near-line storage.” And it’s evidently not the same code line; e.g., the newer product isn’t bit-mapped, while the older one is.
- The near-line product was originally focused on the SAP market. Now it’s moving beyond.
- Canada-based SAND had offices in Germany and the UK before it did in the US. This leads to an oddity – SAND is less focused on the SAP aftermarket in Germany than it still is in the US.
SAND is publicly traded, so its numbers are on display. It turns out to be doing $7 million in annual revenue, and losing money.
OK. I just wanted to get all that out of the way. My main thoughts about the DBMS archiving market are in a separate post.
Categories: Archiving and information preservation, Columnar database management, Data warehousing, SAND Technology | 6 Comments |
Data warehouse load speeds in the spotlight
Syncsort and Vertica combined to devise and run a benchmark in which a data warehouse got loaded at 5 ½ terabytes per hour, which is several times faster than the figures used in any other vendors’ similar press releases in the past. Takeaways include:
- Syncsort isn’t just a mainframe sort utility company, but also does data integration. Who knew?
- Vertica’s design to overcome the traditional slow load speed of columnar DBMS works.
The latter is unsurprising. Back in February, I wrote at length about how Vertica makes rapid columnar updates. I don’t have a lot of subsequent new detail, but it made sense then and now. Read more
Introduction to Kickfire
I’ve spent a few hours visiting or otherwise talking with my new clients at Kickfire recently, so I think I have a better feel for their story. A few details are still missing, however, either because I didn’t get around to asking about them, or because an unexplained accident corrupted my notes (and I wasn’t even using Office 2007). Highlights include: Read more
Categories: Columnar database management, Data warehouse appliances, Data warehousing, Kickfire, MySQL, Theory and architecture | Leave a Comment |
Infobright’s open source move has a lot of potential
Infobright announced today that it’s going full-bore into open source – specifically in the MySQL ecosystem — with the licensing approach, pricing, distribution strategy, and VC money from Sun that such a move naturally entails. I think this is a great idea, for a number of reasons: Read more
Categories: Columnar database management, Data warehousing, Infobright, MySQL, Open source | 4 Comments |