Columnar database management

Analysis of products and issues in column-oriented database management systems. Related subjects include:

September 22, 2011

Teradata Columnar and Teradata 14 compression

Teradata is pre-announcing Teradata 14, for delivery by the end of this year, where by “Teradata 14” I mean the latest version of the DBMS that drives the classic Teradata product line. Teradata 14’s flagship feature is Teradata Columnar, a hybrid-columnar offering that follows in the footsteps of Greenplum (now part of EMC) and Aster Data (now part of Teradata).

The basic idea of Teradata Columnar is:

Each table can be stored in Teradata in row format, column format, or a mix.
You can do almost anything with a Teradata columnar table that you can do with a row-based one.
If you choose column storage, you also get some new compression choices.

Categories: Archiving and information preservation, Columnar database management, Data warehousing, Database compression, Oracle, Rainstor, Teradata

7 Comments

September 7, 2011

Vertica projections — an overview

Partially at my suggestion, Vertica has blogged a three–part series explaining the “projections” that are central to a Vertica database. This is important, because in Vertica projections play the roles that in many analytic DBMS might be filled by base tables, indexes, AND materialized views. Highlights include:

A Vertica projection can contain:
- All the columns in a table.
- Some of the columns in a table.
- A prejoin among tables.
Vertica projections are updated and maintained just as base tables are. (I.e., there’s no kind of batch lag.)
You can import the same logical schema you use elsewhere. Vertica puts no constraints on your logical schema. Note: Vertica has been claiming good support for all logical schemas since Vertica 4.0 came out in early 2010.
Vertica (the product) will automatically generate a physical schema for you — i.e. a set of projections — that Vertica (the company) thinks will do a great job for you. Note: That also dates back to Vertica 4.0.
Vertica claims that queries are very fast even when you haven’t created projections explicitly for them. Note: While the extent to which this is true may be a matter of dispute, competitors clearly overreach when they make assertions like “every major Vertica query needs a projection prebuilt for it.”
On the other hand, it is advisable to build projections (automatically or manually) that optimize performance of certain parts of your query load.

The blog posts contain a lot more than that, of course, both rah-rah and technical detail, including reminders of other Vertica advantages (compression, no logging, etc.). If you’re interested in analytic DBMS, they’re worth a look.

Categories: Columnar database management, Data warehousing, Vertica Systems

5 Comments

August 26, 2011

Virtual data marts in Sybase IQ

I made a few remarks about Sybase IQ 15.3 when it became generally available in July. Now that I’ve had a current briefing, I’ll make a few more.

The key enhancement in Sybase IQ 15.3 is distributed query — what others might call parallel query — aka PlexQ. A Sybase IQ query can now be distributed among many nodes, all talking to the same SAN (Storage-Area Network). Any Sybase IQ node can take the responsibility of being the “leader” for that particular query.

In itself, this isn’t that impressive; all the same things could have been said about pre-Exadata Oracle.* But PlexQ goes somewhat further than just removing a bottleneck from Sybase IQ. Notably, Sybase has rolled out a virtual data mart capability. Highlights of the Sybase IQ virtual data mart story include: Read more

Categories: Columnar database management, Data warehousing, Oracle, Parallelization, Sybase, Theory and architecture, Workload management

1 Comment

July 7, 2011

Sybase IQ soundbites

Sybase made a total hash of the timing of this week’s press release. I got annoyed after they promised to inform me of the new embargo time, then broke the promise. Other people got annoyed earlier than that.

So be it. Below is the draft of a post I was holding, with brackets added around one word that is no longer accurate.

I don’t write enough about Sybase IQ. That said, I offered a couple of quotes to a reporter [yesterday] in connection with the general availability of Sybase IQ 15.3. Lightly edited, they go:

“Shared-everything MPP” isn’t a total contradiction in terms. It’s great for adding in concurrent users. And there’s little doubt that Sybase IQ can support robust access to databases 10s of terabytes in size.
As I first noted a couple of years ago, virtual data marts are a good idea. Too few vendors are making it easy to spin them out. They let departments start doing analytics very quickly, yet allow IT to keep partial control.

Beyond that, I should note:

Sybase IQ is the classic choice for what I call traditional data marts.
Sybase IQ is a leader in temporal functionality, which is not coincidental to its presence in the financial services market.

Categories: Columnar database management, Data warehousing, Parallelization, Sybase, Theory and architecture

Eight kinds of analytic database (Part 2)

In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I’ll cover four more kinds of analytic database — even newer, for the most part, with a use case/product short list match that is even less clear. Read more

Categories: Analytic technologies, Archiving and information preservation, Business intelligence, Buying processes, Cloud computing, Columnar database management, Data mart outsourcing, Data types, Data warehouse appliances, Data warehousing, Database compression, Database diversity, EAI, EII, ETL, ELT, ETLT, Greenplum, Hadoop, Investment research and trading, Log analysis, MapReduce, MOLAP, MySQL, Netezza, NoSQL, Open source, Petabyte-scale data management, Predictive modeling and advanced analytics, Rainstor, SAND Technology, Scientific research, SenSage, Software as a Service (SaaS), Streaming and complex event processing (CEP), Telecommunications, Vertica Systems, Web analytics

6 Comments

July 5, 2011

Eight kinds of analytic database (Part 1)

Analytic data management technology has blossomed, leading to many questions along the lines of “So which products should I use for which category of problem?” The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for “big data” is little help.

Let’s try eight categories instead. While no categorization is ever perfect, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need — and in most cases you’ll need several — is a great early step in your analytic technology planning. Read more

Categories: Analytic technologies, Aster Data, Benchmarks and POCs, Business intelligence, Buying processes, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Database diversity, Exadata, Greenplum, IBM and DB2, Infobright, Investment research and trading, Log analysis, Microsoft and SQL*Server, MOLAP, Netezza, OLTP, Oracle, ParAccel, Parallelization, Petabyte-scale data management, Predictive modeling and advanced analytics, Pricing, QlikTech and QlikView, SAND Technology, Scientific research, Sybase, Teradata, Vertica Systems, Web analytics, Workload management

7 Comments

June 26, 2011

What to think about BEFORE you make a technology decision

When you are considering technology selection or strategy, there are a lot of factors that can each have bearing on the final decision — a whole lot. Below is a very partial list.

In almost any IT decision, there are a number of environmental constraints that need to be acknowledged. Organizations may have standard vendors, favored vendors, or simply vendors who give them particularly deep discounts. Legacy systems are in place, application and system alike, and may or may not be open to replacement. Enterprises may have on-premise or off-premise preferences; SaaS (Software as a Service) vendors probably have multitenancy concerns. Your organization can determine which aspects of your system you’d ideally like to see be tightly integrated with each other, and which you’d prefer to keep only loosely coupled. You may have biases for or against open-source software. You may be pro- or anti-appliance. Some applications have a substantial need for elastic scaling. And some kinds of issues cut across multiple areas, such as budget, timeframe, security, or trained personnel.

Multitenancy is particularly interesting, because it has numerous implications. Read more

Categories: Analytic technologies, Business intelligence, Buying processes, Cloud computing, Columnar database management, Data warehouse appliances, Data warehousing, EAI, EII, ETL, ELT, ETLT, Predictive modeling and advanced analytics, Software as a Service (SaaS)

3 Comments

June 20, 2011

The Vertica story (with soundbites!)

I’ve blogged separately that:

Vertica has a bunch of customers, including seven with 1 or more petabytes of data each.
Vertica has progressed down the analytic platform path, with Monday’s release of Vertica 5.0.

And of course you know:

Vertica (the product) is columnar, MPP, and fast.*
Vertica (the company) was recently acquired by HP.**

Categories: Benchmarks and POCs, Columnar database management, ParAccel, Parallelization, Vertica Systems

4 Comments

June 20, 2011

Columnar DBMS vendor customer metrics

Last April, I asked some columnar DBMS vendors to share customer metrics. They answered, but it took until now to iron out a couple of details. Overall, the answers are pretty impressive. Read more