September 13, 2009

Fault-tolerant queries

MapReduce/Hadoop fans sometimes raise the question of query fault-tolerance. That is — if a node fails, does the query need to be restarted, or can it keep going? For example, Daniel Abadi et al. trumpet query fault-tolerance as one of the virtues of HadoopDB. Some of the scientists at XLDB spoke of query fault-tolerance as being a good reason to leave 100s or 1000s of terabytes of data in Hadoop-managed file systems.

When we discussed this subject a few months ago in a couple of comment threads, it seemed to be the case that:

Hadoop generally has query fault-tolerance. Intermediate result sets are materialized, and data isn’t tied to nodes anyway. So if a node goes down, its work can be sent to another node.
Hive actually did not have query fault-tolerance at that time, but it was on the roadmap. (Edit: Actually, it did within a single MapReduce job. But one Hive job can comprise several rounds of MapReduce.)
Most DBMS vendors do not have query fault-tolerance. If a query fails, it gets restarted from scratch.
Aster Data’s nCluster, however, does appear to have some kind of query fault-tolerance.

This raises an obvious (pair of) question(s) — why and/or when would anybody ever care about query fault-tolerance? To start answering that, let’s propose a simple metric for query execution time on a grid or cluster: node-hours = (duration of query) * (number of nodes it executes on). Then, to a first approximation, it would seem:

If the expected number of node-hours for your queries is of the same order of magnitude as or higher than the node MTTF (Mean Time To Failure) of a node, query fault-tolerance matters a lot, because without you’ll get a whole lot of retries.
If the expected number of node-hours for your queries is one order of magnitude down less than the node MTTF, query fault-tolerance has a significant impact on performance, but you can do without it.
If the expected number of node-hours is even lower than that, query fault-tolerance is pretty irrelevant.
Using cheap/low-end/commodity hardware — as Hadoop fans like to do — increases the (node-hours)/MTTF ratio in two ways — it both increases the numerator and decreases the denominator.

Frankly, I’m not too clear on the use cases for which query fault-tolerance really matters. Still, as noted above, it is indeed coming up more and more. I’m not aware that it would be terribly hard for DBMS vendors to, as an option, let DBAs specify execution plans that force intermediate query materialization and further to use that materialization as the basis for query fault-tolerance. Perhaps doing so should be on some technical roadmaps.

Categories: Analytic technologies, Aster Data, Data warehousing, Hadoop, Parallelization, Scientific research, Theory and architecture

Subscribe to our complete feed!

Comments

10 Responses to “Fault-tolerant queries”

HadoopDB | DBMS2 -- DataBase Management System Services on September 13th, 2009 12:59 am

[…] Query fault-tolerance […]
Todd Lipcon on September 13th, 2009 3:33 am

How is it that you classify Hive as not having fault tolerance? Hive’s execution layer is MapReduce jobs on Hadoop, and thus has the same fault tolerance properties as Hadoop jobs in general. Failed tasks will be re-executed as necessary up to a user-configurable threshold.

-Todd
Curt Monash on September 13th, 2009 12:06 pm

Todd,

See Joydeep’s explanation in http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/

Short answer: One Hive job can comprise many MapReduce jobs.

I’m also editing my post above for clarity.
Unholyguy on September 13th, 2009 12:36 pm

Good analysis, in a way that ratio serves as the upper end of the current scalability of distributed RDBMS’. That upper end is pretty high. Nowdays it seems like multi hour queries are becoming more and more rare for most enterprise warehouses.

However I do argue with this statement “Using cheap/low-end/commodity hardware — as Hadoop fans like to do — increases the (node-hours)/MTTF ratio”

It’s not the commodity hardware that is the issue, Teradata is mostly commodity hardware. it’s not even map reduce, I think the hadoop approach to pipelining and block based partitioning which chooses to go down a brute force route rather then a clever query optimization route.

i think the concept that machines do not have identities whatsoever is pretty powerful in general, outside of the restartable queries thing. It’s made possible a lot of the ec2 work cloudera has done.

It also could allow some interesting virtualization like approaches to queries, for instance it would be theoretically possibly in the hadoop architecture to copy a query in mid execution to another cluster. Or pause them indefinitely.
Curt Monash on September 13th, 2009 1:07 pm

Actually, a big theme in MapReduce is “true commodity” hardware vs. “enterprise-class non-proprietary hardware”. That’s in the Google data center story even before MapReduce was popularized. It’s central to Facebook’s fondness for Hadoop. Etc., etc.

And one of Aster Data’s messages is “Unlike those other companies in the same category as us, we let you get buy with true commodity hardware.”
Unholyguy on September 13th, 2009 3:37 pm

i guess you can make an argument for storage on those lines. I’m not sure there is really much differentiation in servers or network anymore
Matt on September 14th, 2009 2:12 pm

I was told by Aster that they don’t have query fault tolerance but if a query fails in flight it will automatically restart from the beginning. This is with their current 3.x version. I’m not sure what the 4.x version will have.

With regard to commodity hardware; doesn’t 95% of the vendors out there run on commodity hardware, albeit high end commodity hardware?

When I hear commodity hardware I think of that box sitting in my garage… I think the phrase is over used…
Amrith Kumar on October 4th, 2009 12:53 pm

Query Fault Tolerance is a big deal with big databases. In cases where the database is running on top of some RAID-X storage, a single drive failure (the most common failure) could go unnoticed. But, in the case of systems that have one-drive-per-node, a single drive failure could cause a hiccup that must be handled in the database.

At issue is the fact that while drive densities have gone up (higher capacity per drive), the failure rate per GB per year has not gone down in sync.

MapReduce provides the ability to restart a node and have the MapReduce program continue to run; the same is not he case with pipelined MapReduce programs or programs that have multiple stages, one of which is MapReduce.
Oracle’s version of “actually, we’ve been doing MapReduce all along too” | DBMS2 -- DataBase Management System Services on October 6th, 2009 6:02 am

[…] query result materialization. Presumably, then, Oracle’s quasi-MapReduce would also lack query fault-tolerance. Categories: Analytic technologies, MapReduce, Oracle, Parallelization Subscribe to our […]
RYW (Read-Your-Writes) consistency explained | DBMS2 -- DataBase Management System Services on May 1st, 2010 1:18 am

[…] Query fault-tolerance […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Fault-tolerant queries

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin