October 22, 2014

Is analytic data management finally headed for the cloud?

It seems reasonable to wonder whether analytic data management is headed for the cloud. In no particular order:

Amazon Redshift appears to be prospering.
So are some SaaS (Software as a Service) business intelligence vendors.
Amazon Elastic MapReduce is still around.
Snowflake Computing launched with a cloud strategy.
Cazena, with vague intentions for cloud data warehousing, destealthed.*
Cloudera made various cloud-related announcements.
Data is increasingly machine-generated, and machine-generated data commonly originates off-premises.
The general argument for cloud-or-at-least-colocation has compelling aspects.
Analytic workloads can be “bursty”, and so could benefit from true cloud elasticity.

Also — although the specifics on this are generally vague and/or confidential — I sense a narrowing of the gap between:

The hardware + networking required for performant analytic data management.
The hardware + networking available in the cloud.

*Cazena is proud of its team of advisors. However, the only person yet announced for a Cazena operating role is Prat Moghe, and his time period in Netezza’s mainstream happens not to have been one in which Netezza had much technical or market accomplishment.

On the other hand:

If you have processing power very close to the data, then you can avoid a lot of I/O or data movement. Many cloud configurations do not support this.
Many optimizations depend upon controlling or at least knowing the hardware and networking set-up. Public clouds rarely offer that level of control.

And so I’m still more confident in SaaS/colocation analytic data management, or in Redshift, than I am in true arm’s-length cloud-based systems.

Categories: Amazon and its cloud, Cloud computing, Data warehousing, Netezza

Subscribe to our complete feed!

Comments

3 Responses to “Is analytic data management finally headed for the cloud?”

J. Andrew Rogers on October 22nd, 2014 7:02 pm

In my experience, machine-generated data is creating a strong economic driver toward fine-grained geo-federated data management rather than cloud per se. It makes sense to push a lot of the analytics all the way out to the edge, very close to the generating source, such that there is no meaningful cloud deployment model.

The driver is insufficient or expensive bandwidth between the location where it is generated and a sufficiently large data center, cloud or otherwise, where it can be aggregated in the raw and analyzed. Even with aggressive filtering at the edge, a petabyte per day of rich, operational data is not uncommon. There is a desire to analyze the data in place as part of a federated analytic rather than culling and backhauling, which loses too much context.

We work with a lot of very fast, very large machine-generated data sources primarily for the purposes of spatial analytics. Much of this is in the cloud on big commodity clusters. One of the most common asks by customers is if our platform can be deployed as a fine-grained geo-federated system so that many of the analytics can run adjacent to the collection platform to reduce bandwidth requirements.

For machine-generated (“IoT”), sensor, and other spatially organized data models this actually makes a lot sense because the topology of reality matches the topology of the spatial analysis algorithms. Spatial joins and aggregates across various data sources would have a lot of physical network locality when organized this way, which reduces analysis latency while increasing the effective bandwidth and practically supportable data volumes.
David Gruzman on October 23rd, 2014 6:13 am

In many cases reserved cluster instances are used for big data analytic in amazon cloud.
I would argue that it is something in the middle between cloud and collocation, even closer to collocation.
Why it is similar to collocation:
– It is predefined hardware.
– It is guaranteed network (10 GB), but only when all servers allocated together in one placement group.
– It is too expensive to take on-demand, so it is used as reserved (less elastic).
Why it still resemble cloud:
– API for provisioning is available.
– Cloud services, like S3 and EBS are available.
Ranko Mosic on October 24th, 2014 10:07 am

Network bandwith is still the biggest obstacle to wider cloud analytics adoption. Moving very large volumes of data associated with analytics is all but impossible for most enterprises under current network infrastructure limitations. As Curt and Andrew said above, machine ( cloud ) generated data is best fit for cloud analytics, as data is already out there so it either has to/can be processed in situ or moved to cloud processing site.

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Is analytic data management finally headed for the cloud?

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin