Fault-tolerant queries
MapReduce/Hadoop fans sometimes raise the question of query fault-tolerance. That is — if a node fails, does the query need to be restarted, or can it keep going? For example, Daniel Abadi et al. trumpet query fault-tolerance as one of the virtues of HadoopDB. Some of the scientists at XLDB spoke of query fault-tolerance as being a good reason to leave 100s or 1000s of terabytes of data in Hadoop-managed file systems.
When we discussed this subject a few months ago in a couple of comment threads, it seemed to be the case that:
- Hadoop generally has query fault-tolerance. Intermediate result sets are materialized, and data isn’t tied to nodes anyway. So if a node goes down, its work can be sent to another node.
- Hive actually did not have query fault-tolerance at that time, but it was on the roadmap. (Edit: Actually, it did within a single MapReduce job. But one Hive job can comprise several rounds of MapReduce.)
- Most DBMS vendors do not have query fault-tolerance. If a query fails, it gets restarted from scratch.
- Aster Data’s nCluster, however, does appear to have some kind of query fault-tolerance.
This raises an obvious (pair of) question(s) — why and/or when would anybody ever care about query fault-tolerance? To start answering that, let’s propose a simple metric for query execution time on a grid or cluster: node-hours = (duration of query) * (number of nodes it executes on). Then, to a first approximation, it would seem:
- If the expected number of node-hours for your queries is of the same order of magnitude as or higher than the node MTTF (Mean Time To Failure) of a node, query fault-tolerance matters a lot, because without you’ll get a whole lot of retries.
- If the expected number of node-hours for your queries is one order of magnitude down less than the node MTTF, query fault-tolerance has a significant impact on performance, but you can do without it.
- If the expected number of node-hours is even lower than that, query fault-tolerance is pretty irrelevant.
- Using cheap/low-end/commodity hardware — as Hadoop fans like to do — increases the (node-hours)/MTTF ratio in two ways — it both increases the numerator and decreases the denominator.
Frankly, I’m not too clear on the use cases for which query fault-tolerance really matters. Still, as noted above, it is indeed coming up more and more. I’m not aware that it would be terribly hard for DBMS vendors to, as an option, let DBAs specify execution plans that force intermediate query materialization and further to use that materialization as the basis for query fault-tolerance. Perhaps doing so should be on some technical roadmaps.
Comments
10 Responses to “Fault-tolerant queries”
Leave a Reply
[…] Query fault-tolerance […]
How is it that you classify Hive as not having fault tolerance? Hive’s execution layer is MapReduce jobs on Hadoop, and thus has the same fault tolerance properties as Hadoop jobs in general. Failed tasks will be re-executed as necessary up to a user-configurable threshold.
-Todd
Todd,
See Joydeep’s explanation in http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/
Short answer: One Hive job can comprise many MapReduce jobs.
I’m also editing my post above for clarity.
Good analysis, in a way that ratio serves as the upper end of the current scalability of distributed RDBMS’. That upper end is pretty high. Nowdays it seems like multi hour queries are becoming more and more rare for most enterprise warehouses.
However I do argue with this statement “Using cheap/low-end/commodity hardware — as Hadoop fans like to do — increases the (node-hours)/MTTF ratio”
It’s not the commodity hardware that is the issue, Teradata is mostly commodity hardware. it’s not even map reduce, I think the hadoop approach to pipelining and block based partitioning which chooses to go down a brute force route rather then a clever query optimization route.
i think the concept that machines do not have identities whatsoever is pretty powerful in general, outside of the restartable queries thing. It’s made possible a lot of the ec2 work cloudera has done.
It also could allow some interesting virtualization like approaches to queries, for instance it would be theoretically possibly in the hadoop architecture to copy a query in mid execution to another cluster. Or pause them indefinitely.
Actually, a big theme in MapReduce is “true commodity” hardware vs. “enterprise-class non-proprietary hardware”. That’s in the Google data center story even before MapReduce was popularized. It’s central to Facebook’s fondness for Hadoop. Etc., etc.
And one of Aster Data’s messages is “Unlike those other companies in the same category as us, we let you get buy with true commodity hardware.”
i guess you can make an argument for storage on those lines. I’m not sure there is really much differentiation in servers or network anymore
I was told by Aster that they don’t have query fault tolerance but if a query fails in flight it will automatically restart from the beginning. This is with their current 3.x version. I’m not sure what the 4.x version will have.
With regard to commodity hardware; doesn’t 95% of the vendors out there run on commodity hardware, albeit high end commodity hardware?
When I hear commodity hardware I think of that box sitting in my garage… I think the phrase is over used…
Query Fault Tolerance is a big deal with big databases. In cases where the database is running on top of some RAID-X storage, a single drive failure (the most common failure) could go unnoticed. But, in the case of systems that have one-drive-per-node, a single drive failure could cause a hiccup that must be handled in the database.
At issue is the fact that while drive densities have gone up (higher capacity per drive), the failure rate per GB per year has not gone down in sync.
MapReduce provides the ability to restart a node and have the MapReduce program continue to run; the same is not he case with pipelined MapReduce programs or programs that have multiple stages, one of which is MapReduce.
[…] query result materialization. Presumably, then, Oracle’s quasi-MapReduce would also lack query fault-tolerance. Categories: Analytic technologies, MapReduce, Oracle, Parallelization Subscribe to our […]
[…] Query fault-tolerance […]