June 26, 2012

Teradata SQL-H, using HCatalog

When I grumbled about the conference-related rush of Hadoop announcements, one example of many was Teradata Aster’s SQL-H. Still, it’s an interesting idea, and a good hook for my first shot at writing about HCatalog. Indeed, other than the Talend integration bundled into Hortonworks’ HDP 1, Teradata SQL-H is the first real use of HCatalog I’m aware of.

The Teradata SQL-H idea is:

Register your Hadoop data to HCatalog. I’ll confess to being unclear about the details of how that works, for example in the case of data that just doesn’t fit well into flat relational tables. Stay tuned for future posts. For now, I’ll just note that:
- HCatalog is closely based on Hive’s metadata management. If you’ve run Hive against the data, HCatalog should already know about it.
- HCatalog can handle Pig and HBase data as well.
Write SQL DDL (Data Description Language) so that your Aster cluster knows about the data.
Write any Teradata Aster SQL/MR against that data. Some of the execution will be done on the Hadoop cluster, but pulling data back into Aster may well be necessary.

At least in theory, Teradata SQL-H lets you use a full set of analytic tools against your Hadoop data, with little limitation except price and/or performance. Teradata thinks the performance of all this can be much better than if you just use Hadoop (35X was mentioned in one particularly favorable example), but perhaps much worse than if you just copy/extract the data to an Aster cluster in the first place.

So what might the use cases be for something like SQL-H? Offhand, I’d say:

SQL-H use cases are probably focused in areas where copying the data to Aster in advance doesn’t make a lot of sense. So presumably …
… the Hadoop clusters involved would hold a lot more data than you’d want to pay for storing in Teradata Aster. E.g., think of cases where Hadoop is used as a big bit bucket or archival data store.
There could be a kind of investigative workflow. First you play around with the Hadoop data via SQL-H. Then when you think you’re onto something, you set up ETL (Extract/Transform/Load) to get the data into Aster and ratchet up the effort.

By way of contrast, the whole thing makes less sense for dashboarding kinds of uses, unless the dashboard users are very patient when they want to drill down.

Categories: Aster Data, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, Emulation, transparency, portability, Hadoop, MapReduce, SQL/Hadoop integration, Teradata

Subscribe to our complete feed!

Comments

10 Responses to “Teradata SQL-H, using HCatalog”

Vlad Rodionov on June 26th, 2012 11:20 am

I think that days of old good MPP databases are over (at least when we talk about “big data analytics”). All attempts to marry Terradata, Aster, Greenplum etc with Hadoop look unnatural. Combination of Hive and R covers probably more than 90% of all possible use cases in analytical data processing: extract sample of data from Hadoop cluster using Hive and run R scripts on that data sample. I am not aware about single use case where processing of 100% of data (terabytes and petabytes) is a MUST requirement.
Michael Mcintire on June 27th, 2012 12:08 pm

hCatalog is a “Dictionary” or “Catalog”. It’s used to store the metadata about the structure of data, not the data itself. In this way, PIG and all other implementations can map structure at runtime. Say what you want about “Unstructured” data, but the vast majority of applications bind a structure to the underling data so it can be consumed… this just makes that declaration portable across platforms. And who is using it? Ask the guys at Yahoo. Indispensable.
Cesar Rojas on July 2nd, 2012 2:47 am

Hi Vlad,

I work at Teradata Aster and I appreciate your comments.

We are very customer driven. We’ve talked to many Hadoop customers before developing the SQL-H functionality.

Extracting samples and using R may work for some use cases, but the majority of enterprise Hadoop customers want a scalable way to do SQL & BI processing on their Hadoop data. Also, not everyone is willing to go to R, due to the large adoption of SQL-based tools.

I also understand that R breaks down at the Gigabyte range which is too little (let me know if you have heard anything otherwise).

Thanks,
Cesar
Curt Monash on July 2nd, 2012 4:21 am

Hi Cesar,

Thanks for commenting!

Your “gigabyte range” figure for R breaking down sounds very odd to me. R assumes all data is in memory, which might be what you’re thinking of. But various vendors try to work around even that limitation.
Cesar Rojas on July 2nd, 2012 1:21 pm

Thanks Curt for the info, it makes sense. Thanks also for writing this note. Regards.
HCatalog — yes, it matters | DBMS 2 : DataBase Management System Services on August 13th, 2012 12:02 pm

[…] DBMS integrations such as Teradata Aster’s SQL-H. […]
The Teradata Aster Big Analytics Aster/Hadoop appliance | DBMS 2 : DataBase Management System Services on October 17th, 2012 8:04 am

[…] A central part of Teradata’s strategy is that Aster and Hadoop nodes can work together via SQL-H. […]
Hadoop/RDBMS integration: Aster SQL-H and Hadapt | DBMS 2 : DataBase Management System Services on October 17th, 2012 8:05 am

[…] Hadoop and MapReduce with relational DBMS come from my clients at Teradata Aster (via SQL/MR and SQL-H) and Hadapt. In both cases, the story […]
Teradata SQL-H | DBMS 2 : DataBase Management System Services on April 15th, 2013 2:46 am

[…] vendors so often do, Teradata has caused itself some naming confusion. SQL-H was introduced as a facility of Teradata Aster, to complement SQL-MR.* But while SQL-MR is in essence a set of SQL extensions, SQL-H is not. […]
Aster 6, graph analytics, and BSP | DBMS 2 : DataBase Management System Services on October 10th, 2013 7:42 am

[…] 6, aka the Teradata Aster Discovery Platform, includes HDFS compatibility, native MapReduce and ways of invoking Hadoop MapReduce on non-Aster nodes or clusters — but even so, you can’t run Hadoop MapReduce within Aster over Aster’s version […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Teradata SQL-H, using HCatalog

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin