November 19, 2015

The questionably named Cloudera Navigator Optimizer

I only have mixed success at getting my clients to reach out to me for messaging advice when they’re introducing something new. Cloudera Navigator Optimizer, which is being announced along with Cloudera 5.5, is one of my failures in that respect; I heard about it for the first time Tuesday afternoon. I hate the name. I hate some of the slides I saw. But I do like one part of the messaging, namely the statement that this is about “refactoring” queries.

All messaging quibbles aside, I think the Cloudera Navigator Optimizer story is actually pretty interesting, and perhaps not just to users of SQL-on-Hadoop technologies such as Hive (which I guess I’d put in that category for simplicity) or Impala. As I understand Cloudera Navigator Optimizer:

It’s all about analytic SQL queries.
Specifically, it’s about reducing duplicated work.
It is not an “optimizer” in the ordinary RDBMS sense of the word.
It’s delivered via SaaS (Software as a Service).
Conceptually, it’s not really tied to SQL-on-Hadoop. However, …
… in practice it likely will be used by customers who want to optimize performance of Cloudera’s preferred styles of SQL-on-Hadoop, either because they’re already using SQL-on-Hadoop or in connection with an initial migration.

It grows out of Xplain.io, which started with the intention of being a general workload optimizer for Hadoop and wound up with this beta announcement of a tuning adviser for analytic SQL.

Right now, the Cloudera Navigator Optimizer service is:

Query code in.
Information and advice out.

Naturally, Cloudera’s intention — perhaps as early as at first general availability — is for the output to start including something that’s more like automation, e.g. hints for the Impala optimizer.

As Anupam Singh describes it, there are basically four kinds of problems that Cloudera Navigator Optimizer can help with:

ETL (Extract/Transform/Load) might repeat the same operation over and over again, e.g. joining to a reference table to help with data cleaning. It can be an optimization to consolidate some of that work. (The same would surely also be true in cases where the workload is more properly described as ELT.)
For business intelligence it is often helpful to materialize aggregates or result sets. (This is, of course, why materialized views were invented in the first place.)
Queries-from-hell — perhaps thousands of lines of SQL long — can perhaps be usefully rewritten into a sequence of much shorter queries.
Ad-hoc query workloads can have enough repetition that there’s opportunity for similar optimizations. Anupam thinks his technology has enough intelligence to detect some of these patterns.

Actually, all four of these cases can involve materializing tables so that they don’t need to keep being in part or whole recreated.

In essence, then, this is a way to add in more query pipelining than the underlying data store automagically provides on its own. And that seems to me like a very good idea to try. The whole thing might be worth trying out at least once, even if your analytic RDBMS installation has nothing to do with SQL at all.

Categories: Business intelligence, Cloudera, Data pipelining, Data warehousing, EAI, EII, ETL, ELT, ETLT, Hadoop, SQL/Hadoop integration

Subscribe to our complete feed!

Comments

4 Responses to “The questionably named Cloudera Navigator Optimizer”

David Gruzman on November 21st, 2015 3:48 am

It sounds like materialized views in Oracle terms (or indexed views in MS SQL terms). Is it comparison valid?
David Gruzman on November 21st, 2015 3:51 pm

I probably misunderstood in some extent. It is much more, but technically benefits described above are provided in “ideal” world of RDBMS by materialized views.
Curt Monash on November 22nd, 2015 2:34 am

It’s just an adviser about materialization, so I don’t think the comparison is good at all.

Obviously I didn’t explain well enough. 🙁
Early Week Readings: Nov 23, 2015 | TeqBiz on November 25th, 2015 8:01 pm

[…] Curt Monash’s blog about Xplain.io’s (a company I helped jumpstart and also served as VP of Product Management/marketing until we sold it to Cloudera) technology after it was released as “Cloudera Navigator Optimizer” (DBMS2). […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

The questionably named Cloudera Navigator Optimizer

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin