Vertica Systems

Analysis of columnar data warehouse DBMS vendor Vertica Systems. Related subjects include:

June 20, 2011

The Vertica story (with soundbites!)

I’ve blogged separately that:

Vertica has a bunch of customers, including seven with 1 or more petabytes of data each.
Vertica has progressed down the analytic platform path, with Monday’s release of Vertica 5.0.

And of course you know:

Vertica (the product) is columnar, MPP, and fast.*
Vertica (the company) was recently acquired by HP.**

Categories: Benchmarks and POCs, Columnar database management, ParAccel, Parallelization, Vertica Systems

4 Comments

June 20, 2011

Vertica as an analytic platform

Vertica 5.0 is coming out today, and delivering the down payment on Vertica’s analytic platform strategy. In Vertica lingo, there’s now a Vertica SDK (Software Development Kit), featuring Vertica UDT(F)s* (User-Defined Transform Functions). Vertica UDT syntax basics start: Read more

Categories: Analytic technologies, Data warehousing, GIS and geospatial, Predictive modeling and advanced analytics, RDF and graphs, Vertica Systems, Workload management

7 Comments

June 20, 2011

Temporal data, time series, and imprecise predicates

I’ve been confused about temporal data management for a while, because there are several different things going on.

Date arithmetic. This of course has been around for a very long — er, for a very long time.
Time-series-aware compression. This has been around for quite a while too.
“Time travel”/snapshotting — preserving the state of the database at previous points in time. This is a matter of exposing (and not throwing away) the information you capture via MVCC (Multi-Version Concurrency Control) and/or append-only updates (as opposed to update-in-place). Those update strategies are increasingly popular for pretty much anything except update-intensive OLTP (OnLine Transaction Processing) DBMS, so time-travel/snapshotting is an achievable feature for most vendors.
Bitemporal data access. This occurs when a fact has both a transaction timestamp and a separate validity duration. A Wikipedia article seems to cover the subject pretty well, and I touched on Teradata’s bitemporal plans back in 2009.
Time series SQL extensions. Vertica explained its version of these to me a few days ago. I imagine Sybase IQ and other serious financial-trading market players have similar features.

In essence, the point of time series/event series SQL functionality is to do SQL against incomplete, imprecise, or derived data.* Read more

Categories: Analytic technologies, Data types, Investment research and trading, Log analysis, Sybase, Telecommunications, Theory and architecture, Vertica Systems

2 Comments

June 20, 2011

Columnar DBMS vendor customer metrics

Last April, I asked some columnar DBMS vendors to share customer metrics. They answered, but it took until now to iron out a couple of details. Overall, the answers are pretty impressive. Read more

Categories: Columnar database management, Data warehousing, Games and virtual worlds, Infobright, Investment research and trading, Log analysis, Market share and customer counts, Open source, ParAccel, Petabyte-scale data management, SAND Technology, Sybase, Telecommunications, Vertica Systems, Web analytics

5 Comments

April 14, 2011

Attensity update

I talked with Michelle de Haaff and Ian Hersey of Attensity back in February. We covered a lot of ground, so let’s start with a very high-level view.

Two years ago, Attensity merged with two other companies in somewhat related businesses, thus expanding 4X or so in size.
Due to the merger, Attensity now has two core lines of business:
- Text analytics.
- Driving actions, such as call center or social media response, based on text analytics.
The combined Attensity is part American, part German.
Attensity’s German part compels it to do some public financial reporting. Attensity will do $50-60 million in 2011 revenue.
Attensity crunches text in 17 languages. English is preeminent. #2 is — you guessed it! — German.
A big part of Attensity’s business (or at least of its value proposition) is analyzing the text in social media. Attensity boasts coverage of 75 million social media sources, such as blogs, forums, or review sites.

The four most interesting technical points were probably:

Attensity has changed how it does exhaustive extraction. I’m having some trouble writing that part up, so for now I’ll just refer you to Attensity’s own description of the new way of doing things.
Attensity has development work underway meant to address some of the problems in text analytics/other analytics integration. I don’t feel I got enough detail to want to talk about that yet.
Attensity runs its own data centers, with approximately 60 Hadoop/HBase nodes and 30 nodes of Apache Solr (open source text search). More on that below.
Attensity now OEMs Vertica. More on that below too.

Some more specific notes include: Read more

Categories: Analytic technologies, Cloud computing, Hadoop, HBase, Predictive modeling and advanced analytics, Software as a Service (SaaS), Sybase, Vertica Systems

7 Comments

February 28, 2011

Updating our vendor client disclosures

Edit: This disclosure has been superseded by a March, 2012 version.

From time to time, I disclose our vendor client lists. Another iteration is below. To be clear:

This is a list of Monash Advantage members.
All our vendor clients are Monash Advantage members, unless …
… we work with them primarily in their capacity as technology users. (A large fraction of our user clients happen to be SaaS vendors.)
We do not usually disclose our user clients.
We do not usually disclose our venture capital clients, nor those who invest in publicly-traded securities.
Included in the list below are two expired Monash Advantage members who haven’t said they will renew, as mentioned in my recent post on analyst bias. (You can probably imagine a couple of reasons for that obfuscation.)

With that said, our vendor client disclosures at this time are:

Aster Data
Cloudera
CodeFutures/dbShards
Couchbase
EMC/Greenplum
Endeca
IBM/Netezza
Infobright
Intel
MarkLogic
ParAccel
QlikTech
salesforce.com/database.com
SAND Technology
SAP/Sybase
Schooner Information Technology
Skytide
Splunk
Teradata
Vertica

Categories: About this blog, Aster Data, Cloudera, Couchbase, dbShards and CodeFutures, EMC, Greenplum, IBM and DB2, Infobright, Intel, MarkLogic, Netezza, ParAccel, QlikTech and QlikView, SAND Technology, SAP AG, Schooner Information Technology, Splunk, Sybase, Tableau Software, Teradata, Vertica Systems

1 Comment

February 14, 2011

Now we know why Vertica has been so weirdly evasive

Communicating with Vertica has been tricky recently. But HP is now announced to be buying Vertica, which pretty much forces me to comment about Vertica. 🙂 So I’ll indulge in a little bit of explanation as to what I know about Vertica, whether for publication or under NDA. My analysis of the HP/Vertica combination, and expectations for same, will go into another post. Read more

Categories: Analytic technologies, Data warehousing, HP and Neoview, Market share and customer counts, Michael Stonebraker, Vertica Systems

10 Comments

February 11, 2011

Comments on the 2011 Forrester Wave for Enterprise Data Warehouse Platforms

The Forrester Wave: Enterprise Data Warehouse Platforms, Q1 2011 is now out,* hot on the heels of the Gartner Magic Quadrant. Unfortunately, this particular Forrester Wave is riddled with inaccuracy. Read more

Categories: Analytic technologies, Columnar database management, Data warehousing, EMC, Exadata, Greenplum, Netezza, Oracle, Pricing, SAP AG, Sybase, Teradata, Vertica Systems

8 Comments

February 6, 2011

Columnar compression vs. column storage

I’m getting the increasing impression that certain industry observers, such as Gartner, are really confused about columnar technology. (I further suspect that certain vendors are encouraging this confusion, as vendors commonly do.) So here are some basic points.

A simple way to think about the difference between columnar storage and columnar (or any other kind of) compression is this:

Columnar storage is a reference to how data is grouped together on disk (or in solid-state memory).
(Columnar) compression is a reference to whether the actual data is on disk, or whether you save space by storing some smaller substitute for the actual data.

Specifically, if data in a relational table is grouped together according to what row it’s in, then the database manager is called “row-based” or a “row store.” If it’s grouped together according to what column it’s in, then the database management system is called “columnar” or a “column store.” Increasingly, row-based and columnar storage are being hybridized.

There are two main kinds of compression — compression of bit strings and more intelligent compression of actual data values. Compression of actual data values can reasonably be called “columnar,” in that different columns of data can be compressed in different ways, often depending only on the data in that column.* Read more

Categories: Columnar database management, Data warehousing, Database compression, Exadata, Vertica Systems

21 Comments

February 5, 2011

Comments on the Gartner 2010/2011 Data Warehouse Database Management Systems Magic Quadrant

Edit: Comments on the February, 2012 Gartner Magic Quadrant for Data Warehouse Database Management Systems — and on the companies reviewed in it — are now up.

The Gartner 2010 Data Warehouse Database Management Systems Magic Quadrant is out. I shall now comment, just as I did to varying degrees on the 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants.

Note: Links to Gartner Magic Quadrants tend to be unstable. Please alert me if any problems arise; I’ll edit accordingly.

In my comments on the 2008 Gartner Data Warehouse Database Management Systems Magic Quadrant, I observed that Gartner’s “completeness of vision” scores were generally pretty reasonable, but their “ability to execute” rankings were somewhat bizarre; the same remains true this year. For example, Gartner ranks Ingres higher by that metric than Vertica, Aster Data, ParAccel, or Infobright. Yet each of those companies is growing nicely and delivering products that meet serious cutting-edge analytic DBMS needs, neither of which has been true of Ingres since about 1987. Read more

Categories: 1010data, Actian and Ingres, Analytic technologies, Aster Data, Benchmarks and POCs, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, EMC, Exadata, Greenplum, illuminate Solutions, Infobright, Microsoft and SQL*Server, Netezza, Open source, ParAccel, Pricing, SAND Technology, Storage, Sybase, Teradata, Vertica Systems, Workload management

23 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Vertica Systems

The Vertica story (with soundbites!)

Vertica as an analytic platform

Temporal data, time series, and imprecise predicates

Columnar DBMS vendor customer metrics

Attensity update

Updating our vendor client disclosures

Now we know why Vertica has been so weirdly evasive

Comments on the 2011 Forrester Wave for Enterprise Data Warehouse Platforms

Columnar compression vs. column storage

Comments on the Gartner 2010/2011 Data Warehouse Database Management Systems Magic Quadrant

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin