Teradata will support Presto
At the highest level:
- Presto is, roughly speaking, Facebook’s replacement for Hive, at least for queries that are supposed to run at interactive speeds.
- Teradata is announcing support for Presto with a classic open source pricing model.
- Presto will also become, roughly speaking, Teradata’s replacement for Hive.
- Teradata’s Presto efforts are being conducted by the former Hadapt.
Now let’s make that all a little more precise.
Regarding Presto (and I got most of this from Teradata)::
- To a first approximation, Presto is just another way to write SQL queries against HDFS (Hadoop Distributed File System). However …
- … Presto queries other data stores too, such as various kinds of RDBMS, and federates query results.
- Facebook at various points in time created both Hive and now Presto.
- Facebook started the Presto project in 2012 and now has 10 engineers on it.
- Teradata has named 16 engineers – all from Hadapt – who will be contributing to Presto.
- Known serious users of Presto include Facebook, Netflix, Groupon and Airbnb. Airbnb likes Presto well enough to have 1/3 of its employees using it, via an Airbnb-developed tool called Airpal.
- Facebook is known to have a cluster cited at 300 petabytes and 4000 users where Presto is presumed to be a principal part of the workload.
Daniel Abadi said that Presto satisfies what he sees as some core architectural requirements for a modern parallel analytic RDBMS project:
- Data is pipelined between operators, with no gratuitous writing to disk the way you might have in something MapReduce-based. This is different from the sense of “pipelining” in which one query might keep an intermediate result set hanging around because another query is known to need those results as well.
- Presto processing is vectorized; functions don’t need to be re-invoked a tuple at a time. This is different from the sense of vectorization in which several tuples are processed at once, exploiting SIMD (Single Instruction Multiple Data). Dan thinks SIMD is useful mainly for column stores, and Presto tries to be storage-architecture-agnostic.
- Presto query operators and hence query plans are dynamically compiled, down to byte code.
- Although it is generally written in Java, Presto uses direct memory management rather than relying on what Java provides. Dan believes that, despite being written in Java, Presto performs as if it were written in C.
More precisely, this is a checklist for interactive-speed parallel SQL. There are some query jobs long enough that Dan thinks you need the fault-tolerance obtained from writing intermediate results to disk, ala’ HadoopDB (which was of course the MapReduce-based predecessor to Hadapt).
That said, Presto is a newish database technology effort, there’s lots of stuff missing from it, and there still will be lots of stuff missing from Presto years from now. Teradata has announced contribution plans to Presto for, give or take, the next year, in three phases:
- Phase 1 (released immediately, and hence in particular already done):
- An installer.
- More documentation, especially around installation.
- Command-line monitoring and management.
- Phase 2 (later in 2015)
- Integrations with YARN, Ambari and soon thereafter Cloudera Manager.
- Expanded SQL coverage.
- Phase 3 (some time in 2016)
- An ODBC driver, which is of course essential for business intelligence tool connectivity.
- Other connectors (e.g. more targets for query federation).
- Security.
- Further SQL coverage.
Absent from any specific plans that were disclosed to me was anything about optimization or other performance hacks, and anything about workload management beyond what can be gotten from YARN. I also suspect that much SQL coverage will still be lacking after Phase 3.
Teradata’s basic business model for Presto is:
- Teradata is selling subscriptions, for which the principal benefit is support.
- Teradata reserves the right to make some of its Presto-enhancing code subscription-only, but has no immediate plans to do so.
- Teradata being Teradata, it would love to sell you Presto-related professional services. But you’re absolutely welcome to consume Presto on the basis of license-plus-routine-support-only.
And of course Presto is usurping Hive’s role wherever that makes sense in Teradata’s data connectivity story, e.g. Teradata QueryGrid.
Finally, since I was on the phone with Justin Borgman and Dan Abadi, discussing a project that involved 16 former Hadapt engineers, I asked about Hadapt’s status. That may be summarized as:
- There are currently no new Hadapt sales.
- Only a few large Hadapt customers are still being supported by Teradata.
- The former Hadapt folks would love Hadapt or Hadapt-like technology to be integrated with Presto, but no such plans have been finalized at this time.
Comments
19 Responses to “Teradata will support Presto”
Leave a Reply
In best of my understanding, one of the main problems of JVM based massive data processing is performance of serialization and deserialization processes. In a nutshell – efficient reading of single record from disk to Java is challenging. This part of processing is performance critical for reading tables’s data, as well as for sending intermediate data between nodes in cluster (shuffle stages).
There is developing extension to Java, called Unsafe which gradually enable such things.
I am curious how Presto achieve C like performance? Is Java Unsafe already advanced enough?
Presto uses Unsafe extensively via our Slice library: https://github.com/airlift/slice
Unsafe allows reading and writing primitives (int, long, double, etc.) with a single CPU instruction, just as if you wrote it in C. The get/put calls on Unsafe are special “intrinsics” that the Java JIT compiler understands and turns into efficient native machine code. This lets us treat a byte array as if it is arbitrary memory, like in C. Slice works on data from anywhere: disk, memory, network, etc.
We are also careful about memory allocation. Rows are stored in columnar pages in memory, allowing efficient processing with little GC overhead. You wouldn’t call malloc in the inner loop in C, so of course we don’t do it in Java either.
Thank you for the answer. Can I conclude from the above that Java, to be competitive with C on data processing, must internally work “columnar way” and even represent intermediate result as columnar.
I tried Presto briefly a couple of months back, but found their lack of fault tolerance made it a non-starter for us. (See https://github.com/facebook/presto/issues/1010)
This lack of fault tolerance is a much bigger issue than people realize. Many people run their infrastructure at cloud providers like Amazon, where you routinely experience intermittent “timeout” errors and such while trying to access your data (e.g., reading a file from Amazon S3). And the larger the data, the more likely for this situation to occur in any given query job.
Each time this happens, it results in a failed task. In an environment with fault tolerance (e.g., Hadoop Map Reduce or Spark) this isn’t an issue – the task just gets tried again on another node and eventually succeeds. But on Presto, it kills the whole job/query.
Their assertion that “the user must resubmit the query” in these cases really isn’t a realistic workaround: a) the query may very likely fail the next time it gets run for the same reason (network timeouts), and b) more importantly, this makes Presto completely unusable for situations where there is no “user” to re-submit, such as running scripted queries.
IMO ,Spark SQL is the way forward for those who want to move beyond Hive, not Presto.
Fault tolerance usually come with a price. In case of MapReduce – the price is big.
In my opinion, the questions is – does Presto provide significantly better performance than fault tolerant systems?
In the same time mentioned above big users, like AWS based Netflix, are using Presto, so it can not be that unusable in cloud environment.
Dan Abadi reminded me of something I now wish I’d stressed in my original post — you can have two different query engines over the same data, with the system automagically deciding which one to use. In this context, we can say that one is the interactively fast one — e.g. Presto — while the other is the fault-tolerant one. That all was part of the Hadapt strategy.
And since I mentioned Dan, recall that a different form of the approach is in Vertica — data comes into the write-optimized store (WOS) and soon is flushed to the read-optimized store (ROS), but a query may hit either one, or perhaps go against both and federate its results.
I think the claim that “X is fault tolerant” is usually much more true on paper than in practice. I’d rather know what the query success rates are in practice as there are many reasons for faults. We focus on big downtime events, thus the push to claim things are fault tolerate. We often neglect the small downtime events (GC pause, etc). I’d also want to measure response time variance because slow responses == failures for many deployments.
Mark,
Once a query is taking 10 minutes anyway, having it take 11 minutes is probably not a failure.
However, I’ll confess to being unsure of why a query would take 1,000 or 10,000 node-minutes to execute unless they involve huge amounts of I/O. And if they do, slathering on even more I/O doesn’t obviously lead to efficiency.
I can see an obvious exception for queries that look inside large and opaque data fields (video, log files, etc.), but I don’t get the feeling that this is the main case being considered in these discussions.
I wasn’t aware that the common use case for Presto was multi minute queries. I thought it was multi second queries and for multi second queries why add complexity for check pointing intermediate results.
We have been using Presto @ Netflix on the AWS cloud for over a year and so far machine failures weren’t really a big issue for us — yes, once in a while it may happen but it hasn’t been a big deal for us so far. David Gruzman has a point, there is usually a trade-off between fault tolerance and performance and Presto is definitely designed & implemented for performance from the start.
And regarding darose’s comment I don’t understand why the script cannot re-submit the query on failure.
Mark,
You’re right about the use cases. See my comment above that starts “Dan Abadi”.
Do we need to talk about availability of the query engine separate from the data store? I assume the query engines are (mostly) stateless so keeping them going is easy. The datastore is the hard part, but isn’t the datastore HA problem the same as for all other things Hadoop?
Mark,
What’s the Facebook answer to that? Are their queries “big” enough that fault-tolerance is regarded as more important than Presto’s speed?
As Nezih said, we (Netflix) are using Presto on the AWS cloud for more than a year, and everything goes pretty well.
When Presto is reading from S3, it does have retries, exponential backoffs, etc to make sure the query is working in case of intermittent “timeout” in Amazon S3. It is also configurable.
Our (Netflix) production queries are mostly completed in less than one minute, while, there are some long running queries(which we do not suggest, but we could not stop users loving Presto), which scans large files from s3, and completed successfully in hours.
Presto does not have fault tolerance at the worker/task level. But adding that could be expensive. It is a choice of:
#1 Presto could finish the query in 5 seconds, but at some rare case, you have to retry
#2 Hive/MapReduce could finish the query in 20 minutes, the probability of a query failure in MapReduce is lower than Presto, but it is still possible
For SparkSQL, is there any production use case for 1PB+ dataset?
@Zhenxiao: Yes – there are multiple organizations running Spark SQL against 1PB+ dataset. Some of them are at Spark Summit next week.
@Zhenxiao: Also I’m slightly confused by your position. If you need to process 1PB+ datasets, fault-tolerance seems to be important. If all you are processing are tiny datasets (or a tiny part of PB+ datasets using coarse-grained indexing), then fault-tolerance doesn’t matter as much.
Ya know — the whole thing might be clearer with a few actual numbers.
In a very large query, what IS the likelihood that a sub-task would fail, making you wish you had fault tolerance?
By way of comparison, how much is execution slowed by insisting on fault-tolerance?
And by the way, just how big does a query have to be before it takes 10 minutes to run or 1 hour or whatever, across many nodes, if we assume it’s optimized for speed rather than fault-tolerance.
[…] Key analytics news from Hadoop Summit: new releases of MapR and HDP; Teradata announces support for Presto (more here). […]
@Reynold Xin: Thank you for the info. Glad to know. Are there any public numbers or details about their deployment?
I was/am talking about big datasets, 10PB+.
From our (Netflix) experience, Presto could run SQL queries against big datasets, 10PB+, and most of the queries could complete within 1 minute, these queries are processing big datasets, not using index at all.
Yes, Fault tolerance could always be important. From our experience, seems like it depends on the engine, if the engine is super fast like Presto, in some rare case, retry the query is OK, as the same query, which runs against big datasets could be running in tens of minutes if not hours in Hive or other inefficient engine.
Is there any numbers about the cluster size and data size for SparkSQL deployment, in production?