September 28, 2014

Some stuff on my mind, September 28, 2014

1. I wish I had some good, practical ideas about how to make a political difference around privacy and surveillance. Nothing else we discuss here is remotely as important. I presumably can contribute an opinion piece to, more or less, the technology publication(s) of my choice; that can have a small bit of impact. But I’d love to do better than that. Ideas, anybody?

2. A few thoughts on cloud, colocation, etc.:

The economies of scale of colocation-or-cloud over operating your own data center are compelling. Most of the reasons you outsource hardware manufacture to Asia also apply to outsourcing data center operation within the United States. (The one exception I can think of is supply chain.)
The arguments for cloud specifically over colocation are less persuasive. Colo providers can even match cloud deployments in rapid provisioning and elastic pricing, if they so choose.
Surely not coincidentally, I am told that Rackspace is deemphasizing cloud, reemphasizing colocation, and making a big deal out of Open Compute. In connection with that, Rackspace has pulled back from its leadership role in OpenStack.
I’m hearing much more mention of Amazon Redshift than I used to. It seems to have a lot of traction as a simple and low-cost option.
I’m hearing less about Elastic MapReduce than I used to, although I imagine usage is still large and growing.
In general, I get the impression that progress is being made in overcoming the inherent difficulties in cloud (and even colo) parallel analytic processing. But it all still seems pretty vague, except for the specific claims being made for traction of Redshift, EMR, and so on.
Teradata recently told me that in colocation pricing, it is common for floor space to be everything, with power not separately metered. But I don’t think that trend is a big deal, as it is not necessarily permanent.
Cloud hype is of course still with us.
Other than the above, I stand by my previous thoughts on appliances, clusters and clouds.

3. As for the analytic DBMS industry:

Concurrency is still a challenge. But otherwise …
… great SQL query performance isn’t something to get excited about any more, especially in immature systems.
Be careful about systems that have great performance when intermediate result sets fit into RAM, but not when they spill to disk. In particular, watch for this problem in the Hadoop/Spark world.
Vendors are getting better about ANSI SQL coverage (SQL 99 Analytics, windowing, etc. …)
“Runs on Hadoop” isn’t an exciting claim unless you can mix and match SQL and generic Hadoop processing in the same jobs against the same data, even though lesser forms of SQL/Hadoop integration might also with help some aspects of TCO (Total Cost of Ownership).
More generally, what’s needed is:
- The ability to mix SQL and other kinds of analytic processing.
- The ability to mix traditional tabular data, JSON, and log data.
- The ability to mix data in place with data that’s trickling/streaming in.

4. Meanwhile, the analytic ease of use story remains popular, in business intelligence and predictive analytics/data science alike. Marketers typically oversimplify it to their own detriment, however, just as they do performance stories.

5. On the short-request side:

NoSQL is still going gangbusters.
NewSQL still isn’t, except that I haven’t talked with MemSQL for a while and they were doing well when I did.
Transparent sharding has stagnated as a business, good technology notwithstanding, and the vendors are pivoting.

6. Finally, one vendor note — Sharmila assures me by brief email that things are going gangbusters at ClearStory. This is unsurprising, as ClearStory exemplifies several trends I believe in, including robust analytic stacks, strong data navigation, Spark, and the incorporation of broad varieties of data.

And of course ClearStory also empowers business analysts to make do without IT involvement, like the other cool analytic kids also do.

Categories: Amazon and its cloud, Business intelligence, ClearStory Data, Cloud computing, Data warehousing, Databricks, Spark and BDAS, EAI, EII, ETL, ELT, ETLT, Hadoop, Log analysis, MapReduce, Market share and customer counts, MemSQL, NoSQL, Predictive modeling and advanced analytics, Transparent sharding

Subscribe to our complete feed!

Comments

3 Responses to “Some stuff on my mind, September 28, 2014”

David Gruzman on September 29th, 2014 3:26 am

I would account EMR downfall to Cloudera. Two years ago EMR was my tool of choice when I need hadoop cluster for a few hours. Today I feel much more comfortable taking Cloudera manager to build hadoop cluster on EC2. It takes a bit more time, but I get cluster I can manage and troubleshoot – something hard to do with EMR. It is also cheaper – I do not have to pay EMR pricing overhead.
One case I see today to prefer EMR – is need for automatic cluster provisioning and re-sizing.
In the same time I think that a vendor who relay its platform on dynamic hadoop cluster provisioning would do their own scripting to have all the control.
Robert Hodges on September 29th, 2014 12:20 pm

NewSQL is looking like an increasing hard proposition. It still seems as if there’s room for a next generation SQL DBMS like FoundationDB that offers great horizontal scaling to get traction. However, it’s going to take a long time because many users are investing in other things like analytics and cloud operation instead of new transaction processing systems. Also, there are incumbents like Oracle, MySQL, and MS SQL Server that work now. Even if they don’t scale perfectly they have excellent value propositions that are continuing to improve. (Example: cloud services like Amazon RDS as well as our clusters at Continuent.)
Migration | DBMS 2 : DataBase Management System Services on January 10th, 2015 1:46 am

[…] because it’s not accompanied by much in the way of programming costs, risks, or delays. Hence Rackspace’s refocus on colo at the expense of cloud. (But it can be hard on your data center […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Some stuff on my mind, September 28, 2014

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin