June 21, 2010

What kinds of data warehouse load latency are practical?

I took advantage of my recent conversations with Netezza and IBM to discuss what kinds of data warehouse load latency were practical. In both cases I got the impression:

Subsecond load latency is substantially impossible. Doing that amounts to OLTP.
5 seconds or so is doable with aggressive investment and tuning.
Several minute load latency is pretty easy.
10-15 minute latency or longer is now very routine.

There’s generally a throughput/latency tradeoff, so if you want very low latency with good throughput, you may have to throw a lot of hardware at the problem.

I’d expect to hear similar things from any other vendor with reasonably mature analytic DBMS technology. Low-latency load is a problem for columnar systems, but both Vertica and ParAccel designed in workarounds from the getgo. Aster Data probably didn’t meet these criteria until Version 4.0, its old “frontline” positioning notwithstanding, but I think it does now.

Related link

Just what is your need for speed anyway?

Categories: Analytic technologies, Aster Data, Columnar database management, Data warehousing, IBM and DB2, Netezza, ParAccel, Vertica Systems

Subscribe to our complete feed!

Comments

4 Responses to “What kinds of data warehouse load latency are practical?”

Bradley C. Kuszmaul on June 21st, 2010 4:21 pm

Although there is a throughput/latency tradeoff, most databases are
far from the tradeoff curve. There are two ways to look at this: in
theory and in practice.

In theory: For example B-trees achieve O(log N/log B)
disk-head-movements per insertion and O(log N/log B) for point
queries. There are known data structures that achieve O((log
N)/sqrt(B)) insertion performance while matching B-tree point-query
and range-query performance. Note that sqrt(B)>>log B for most
interesting block sizes, making these data structures much better at
insertions than B-trees. For example O(log N/log B) is more than one,
but O(log N/sqrt{B}) is something like 1/1000. There is a tradeoff
curve, but B-trees are not on it.

In practice: Tokutek is selling a storage engine for MySQL that
maintains indexes with tens of thousands of insertions per second,
even when the indexes are much larger than main memory, and even for
the worst-case data sets (such as when keys look essentially random).

Since Fractal Tree index performance dominates B-tree performace, I
expect Fractal Tree indexes to displace essentially all uses of B-tree
indexes over time.

This means there is no real reason to divide the world into OLTP and
OLAP. One can have OLAP-like throughput with OLTP-like latency.
Jon Bock on June 22nd, 2010 11:19 am

A key piece of the technology needed to deliver high throughput and low latency loading is offloading and parallelizing the load–serialized loading through a single chokepoint as well as contention between loading and queries are two of the factors that hinder load throughput and latency. I believe your reference to Aster Data has in mind that parallel, offloaded loading leveraging any number of commodity server nodes is part of Aster’s offering.
Curt Monash on June 22nd, 2010 12:44 pm

@Jon,

Actually, I was referring to the fact that single-row inserts were really slow in Aster Data until Version 4. 😉
tzahi on June 23rd, 2010 9:51 am

AFAIK – Vertica takes care of this by delivering results from write cache in memory. I think they give real time update .

Another similar technology is Lucene – in version 2.9 it incorporates near-real-time updates, again by applying queries to the indexWriter cache. it is not an analytic DBMS, but it serves similar purposes. Attivio even builds SQL access including aggregate functions on top of Lucene. This enables to provide DW like functionality on to of semi-structured data.

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

What kinds of data warehouse load latency are practical?

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin