January 4, 2009

Expressor pre-announces a data loading benchmark leapfrog

Expressor Software plans to blow the Vertica/Syncsort “benchmark” out of the water, to wit

What I know already is that our numbers will between 7 and 8 min to load one TB of data and will set another world record for the tpc-h benchmark.

The whole blog post has a delightful air of skepticism, e.g.:

Sometimes the mention of a join and lookup are documented but why? If the files are load ready what is there to join or lookup?

… If the files are load ready and the bulk load interface is used, what exactly is done with the DI product?

My guess… nothing.

…  But what I can’t figure out is what is so complex about this test in the first place?

January 3, 2009

More from Vertica on data warehouse load speeds

Last month, when Vertica releases its “benchmark” of data warehouse load speeds, I didn’t realize it had previously released some actual customer-experience load rates as well.  In a July, 2008 white paper that seems thankfully free of any registration requirements, Vertica cited four examples:

Read more

January 3, 2009

ParAccel’s market momentum

After my recent blog post, ParAccel is once again angry that I haven’t given it proper credit for it accomplishments. So let me try to redress the failing.

Uh, that’s about all I can think of. What else am I forgetting? Surely that can’t be ParAccel’s entire litany of market success!

December 29, 2008

ParAccel actually uses relatively little PostgreSQL code

I often find it hard to write about ParAccel’s technology, for a variety of reasons:

ParAccel is quick, however, to send email if I post anything about them they think is incorrect.

All that said, I did get careless when I neglected to doublecheck something I already knew. Read more

December 29, 2008

Ordinary OLTP DBMS vs. memory-centric processing

A correspondent from China wrote in to ask about products that matched the following application scenario: Read more

December 20, 2008

More grist for the column vs. row mill

Daniel Abadi and Sam Madden are at it again, following up on their blog posts of six months arguing for the general superiority of column stores over row stores (for analytic query processing).  The gist is to recite a number of bases for superiority, beyond the two standard ones of less I/O and better compression, and seems to be based largely on Section 5 of a SIGMOD paper they wrote with Neil Hachem.

A big part of their argument is that if you carry the processing of columnar and/or compressed data all the way through in memory, you get lots of advantages, especially because everything’s smaller and hence fits better into Level 2 cache. There also is some kind of join algorithm enhancement, which seems to be based on noticing when the result wound up falling into a range according to some dimension, and perhaps using dictionary encoding in a way that will help induce such an outcome.

The main enemy here is row-store vendors who say, in effect, “Oh, it’s easy to shoehorn almost all the benefits of a column-store into a row-based system.”  They also take a swipe — for being insufficiently purely columnar — at unnamed columnar Vertica competitors, described in terms that seemingly apply directly to ParAccel.

December 16, 2008

Database archiving and information preservation

Two similar companies reached out to me recently – SAND Technology and Clearpace. Their current market focus is somewhat different: Clearpace talks mainly of archiving, and sells first and foremost into the compliance market, while SAND has the most traction providing “near-line” storage for SAP databases.* But both stories boil down to pretty much the same thing: Cheap, trustworthy data storage with good-enough query capabilities. E.g., I think both companies would agree the following is a not-too-misleading first-approximation characterization of their respective products:

Read more

December 16, 2008

Introduction to Clearpace

Clearpace is a UK-based startup in a similar market to what SAND Technology has gotten into – DBMS archiving, with a strong focus on compression and general cost-effectiveness. Clearpace launched its product NParchive a couple of quarters ago, and says it now has 25 people and $1 million or so in revenue. Clearpace NParchive technical highlights include:

Read more

December 16, 2008

Introduction to SAND Technology

SAND Technology has a confused history. For example:

SAND is publicly traded, so its numbers are on display. It turns out to be doing $7 million in annual revenue, and losing money.

OK. I just wanted to get all that out of the way. My main thoughts about the DBMS archiving market are in a separate post.

December 15, 2008

How to buy an analytic DBMS (overview)

I went to London for a couple of days last week, at the behest of Kognitio. Since I was in the neighborhood anyway, I visited their offices for a briefing. But the main driver for the trip was a seminar Thursday at which I was the featured speaker. As promised, the slides have been uploaded here.

The material covered on the first 13 slides should be very familiar to readers of this blog. I touched on database diversity and the disk-speed barrier, after which I zoomed through a quick survey of the data warehouse DBMS market. But then I turned to material I’ve been working on more recently – practical advice directly on the subject of how to buy an analytic DBMS.

I started by proposing a seven-part segmentation self-assessment:

Read more

December 14, 2008

The “baseball bat” test for analytic DBMS and data warehouse appliances

More and more, I’m hearing about reliability, resilience, and uptime as criteria for choosing among data warehouse appliances and analytic DBMS. Possible reasons include:

The truth probably lies in a combination of all these factors.

Making the most fuss on the subject is probably Aster Data, who like to talk at length both about mission-critical data warehouse applications and Aster’s approach to making them robust. But I’m also hearing from multiple vendors that proofs-of-concept now regularly include stress tests against failure, in what can be – and indeed has been – called the “baseball bat” test. Prospects are encouraged to go on a rampage, pulling out boards, disk drives, switches, power cables, and almost anything else their devious minds can come up with to cause computer carnage.

Read more

December 14, 2008

Kognitio and WX-2 update

I went to Bracknell Wednesday to spend time with the Kognitio team. I think I came away with a better understanding of what the technology is all about, and why certain choices have been made.

Like almost every other contender in the market,* Kognitio WX-2 queries disk-based data in the usual way. Even so, WX-2’s design is very RAM-centric. Data gets on and off disk in mind-numbingly simple ways – table scans only, round-robin partitioning only (as opposed to the more common hash), and no compression. However, once the data is in RAM, WX-2 gets to work, happily redistributing as seems optimal, with little concern about which node retrieved the data in the first place. (I must confess that I don’t yet understand why this strategy doesn’t create ridiculous network bottlenecks.) How serious is Kognitio about RAM? Well, they believe they’re in the process of selling a system that will include 40 terabytes of the stuff. Apparently, the total hardware cost will be in the $4 million range.

*Exasol is the big exception. They basically use disk as a source from which to instantiate in-memory databases.

Other technical highlights of the Kognitio WX-2 story include:

Read more

December 2, 2008

Data warehouse load speeds in the spotlight

Syncsort and Vertica combined to devise and run a benchmark in which a data warehouse got loaded at 5 ½ terabytes per hour, which is several times faster than the figures used in any other vendors’ similar press releases in the past. Takeaways include:

The latter is unsurprising. Back in February, I wrote at length about how Vertica makes rapid columnar updates. I don’t have a lot of subsequent new detail, but it made sense then and now.

Read more

November 26, 2008

Another dubious “end of computer history” argument

In a typically snarky Register article, Chris Mellor raises a caution about the use of future many-cored chips in IT. In essence, he says that today’s apps run in a relatively small number of threads each, and modifying them to run in many threads is too difficult. Hence, most of the IT use for many-cored chips will be via hypervisors that assign apps to cores as makes sense.

Mellor has a point, but he’s overstating it.

Read more

November 22, 2008

The Teradata Accelerate program

An article in Intelligent Enterprise clued me in that Teradata has announced the Teradata Accelerate program. A little poking around revealed a press release in which — lo and behold — I am quoted,* to wit:

“The Teradata Accelerate program is a great idea. There’s no safer choice than Teradata technology plus Teradata consulting, bundled in a fixed-cost offering,” said Curt Monash, president of Monash Research. “The Teradata Purpose Built Platform Family members are optimized for a broad range of business intelligence and analytic uses.”

Read more

November 21, 2008

High-end MySQL use

To a large extent, MySQL lives in two different alternate universes from most other DBMS. One is for low-end, simple database applications. For example, of all the DBMS I write about, MySQL is the one I actually use in my own business — because MySQL sits underneath WordPress, and WordPress is what runs my blogs. My largest database (the one for DBMS2) contains 12 megabytes of data in 11 tables, none of which has yet reached 5000 rows in size. Read more

November 19, 2008

Interpreting the results of data warehouse proofs-of-concept (POCs)

When enterprises buy new brands of analytic DBMS, they almost always run proofs-of-concept (POCs) in the form of private benchmarks. The results are generally confidential, but that doesn’t keep a few stats from occasionally leaking out. As I noted recently, those leaks are problematic on multiple levels. For one thing, even if the results are to be taken as accurate and basically not-misleading, the way vendors describe them leaves a lot to be desired.

Here’s a concrete example to illustrate the point. One of my vendor clients sent over the stats from a recent POC, in which its data warehousing product was compared against a name-brand incumbent. 16 reports were run. The new product beat the old 16 out of 16 times. The lowest margin was a 1.8X speed-up, while the best was a whopping 335.5X.

My client helpfully took the “simple average” — i.e. the mean – of the 16 factors, and described this as an average 62X drubbing. But is that really fair?

Read more

November 19, 2008

MySQL Query Analyzer

Given how the product’s rollout has been handled, it seems necessary to comment on MySQL’s recently released MySQL Query Analyzer without actually having much information on the subject. Mark Callaghan offers a good take — he’s generally very favorable, but notes that MySQL has some limitations that Query Analyzer has trouble getting around.

November 18, 2008

Silly website tricks

Vertica’s marketing is usually good-to-outstanding, but they made a funny misstep this time. If you go to the Vertica home page, you’ll see seasonal art suggesting that their product is a turkey and/or that it’s terrified it’s about to get the ax.

Live by the pun, die by the pun.

November 16, 2008

Graphjam: I can haz BI

Charts and graphs, from the folks who brought you a whole lot of cute kitten photos:

November 16, 2008

When people don’t want accurate predictions made about them

In a recent article on governmental anti-terrorism data mining efforts — and the privacy risks associated with same — The Economist wrote (emphasis mine):

Abdul Bakier, a former official in Jordan’s General Intelligence Department, says that tips to foil data-mining systems are discussed at length on some extremist online forums. Tricks such as calling phone-sex hotlines can help make a profile less suspicious. “The new generation of al-Qaeda is practising all that,” he says.

Well, duh. Terrorists and fraudsters don’t want to be detected. Algorithms that rely on positive evidence of bad intent may work anyway. But if you rely on evidence that shows people are not bad actors, that’s likely to work about as well as Bayesian spam detectors.* Read more

November 15, 2008

High-performance analytics

For the past few months, I’ve collected a lot of data points to the effect that high-performance analytics – i.e., beyond straightforward query — is becoming increasingly important. And I’ve written about some of them at length. For example:

Ack. I can’t decide whether “analytics” should be a singular or plural noun. Thoughts?

Another area that’s come up which I haven‘t blogged about so much is data mining in the database. Data mining accounts for a large part of data warehouse use. The traditional way to do data mining is to extract data from the database and dump it into SAS. But there are problems with this scenario, including:

Read more

November 15, 2008

Beyond query

I sometimes describe database management systems as “big SQL interpreters,” because that’s the core of what they do. But it’s not all they do, which is why I describe them as “electronic file clerks” too. File clerks don’t just store and fetch data; they also put a lot of work into neatening, culling, and generally managing the health of their information hoards.

Already 15 years ago, online backup was as big a competitive differentiator in the database wars as any particular SQL execution feature. Security became important in some market segments. Reliability and availability have been important from the getgo. And manageability has been crucial ever since Microsoft lapped Oracle in that regard, back when SQL Server had little else to recommend it except price.*

*Before Oracle10g, the SQL Server vs. Oracle manageability gap was big.

Now data warehousing is demanding the same kinds of infrastructure richness.*

Read more

November 15, 2008

The query from hell, and other stories

I write about a lot of products whose core job boils down to Make queries run fast. Without exception, their vendors tout stories of remarkable performance gains over conventional/incumbent DBMS (reported improvement is usually at least 50-fold, and commonly 100-500+). They further claim at least 2-3X better performance than their close competitors. In making these claims, vendors usually stress that their results come from live customer benchmarks. In few if any of the cases, I judge, are they lying outright. So what’s going on? Read more

November 12, 2008

MySQL is being used in an IBM Lotus appliance

Apparently, IBM is rolling out an appliance for small businesses. MySQL is under the covers. The appliance won’t have a keyboard or monitor, so there won’t be a lot of database administration going on.

Before Solid and solidDB were acquired by IBM, one of the things Solid was proudest of was some embedded apps in which solidDB ran for years in boxes without keyboards or monitors.

I still think it’s a pity that IBM isn’t using solidDB as broadly as the technology deserves. Even so, this is a nice endorsement of MySQL for reliable zero-DBA mid-range use.

Next Page →

Feed including blog about database management, data warehousing, and business intelligence Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Recent white paper

The Explosion in DBMS Choice

August, 2008

Recent webcast

What leading database vendors don't want you to know

Originally broadcast April 9, 2008

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.