Some stuff on my mind, September 28, 2014
1. I wish I had some good, practical ideas about how to make a political difference around privacy and surveillance. Nothing else we discuss here is remotely as important. I presumably can contribute an opinion piece to, more or less, the technology publication(s) of my choice; that can have a small bit of impact. But I’d love to do better than that. Ideas, anybody?
2. A few thoughts on cloud, colocation, etc.:
- The economies of scale of colocation-or-cloud over operating your own data center are compelling. Most of the reasons you outsource hardware manufacture to Asia also apply to outsourcing data center operation within the United States. (The one exception I can think of is supply chain.)
- The arguments for cloud specifically over colocation are less persuasive. Colo providers can even match cloud deployments in rapid provisioning and elastic pricing, if they so choose.
- Surely not coincidentally, I am told that Rackspace is deemphasizing cloud, reemphasizing colocation, and making a big deal out of Open Compute. In connection with that, Rackspace has pulled back from its leadership role in OpenStack.
- I’m hearing much more mention of Amazon Redshift than I used to. It seems to have a lot of traction as a simple and low-cost option.
- I’m hearing less about Elastic MapReduce than I used to, although I imagine usage is still large and growing.
- In general, I get the impression that progress is being made in overcoming the inherent difficulties in cloud (and even colo) parallel analytic processing. But it all still seems pretty vague, except for the specific claims being made for traction of Redshift, EMR, and so on.
- Teradata recently told me that in colocation pricing, it is common for floor space to be everything, with power not separately metered. But I don’t think that trend is a big deal, as it is not necessarily permanent.
- Cloud hype is of course still with us.
- Other than the above, I stand by my previous thoughts on appliances, clusters and clouds.
3. As for the analytic DBMS industry:
- Concurrency is still a challenge. But otherwise …
- … great SQL query performance isn’t something to get excited about any more, especially in immature systems.
- Be careful about systems that have great performance when intermediate result sets fit into RAM, but not when they spill to disk. In particular, watch for this problem in the Hadoop/Spark world.
- Vendors are getting better about ANSI SQL coverage (SQL 99 Analytics, windowing, etc. …)
- “Runs on Hadoop” isn’t an exciting claim unless you can mix and match SQL and generic Hadoop processing in the same jobs against the same data, even though lesser forms of SQL/Hadoop integration might also with help some aspects of TCO (Total Cost of Ownership).
- More generally, what’s needed is:
- The ability to mix SQL and other kinds of analytic processing.
- The ability to mix traditional tabular data, JSON, and log data.
- The ability to mix data in place with data that’s trickling/streaming in.
4. Meanwhile, the analytic ease of use story remains popular, in business intelligence and predictive analytics/data science alike. Marketers typically oversimplify it to their own detriment, however, just as they do performance stories.
5. On the short-request side:
- NoSQL is still going gangbusters.
- NewSQL still isn’t, except that I haven’t talked with MemSQL for a while and they were doing well when I did.
- Transparent sharding has stagnated as a business, good technology notwithstanding, and the vendors are pivoting.
6. Finally, one vendor note — Sharmila assures me by brief email that things are going gangbusters at ClearStory. This is unsurprising, as ClearStory exemplifies several trends I believe in, including robust analytic stacks, strong data navigation, Spark, and the incorporation of broad varieties of data.
And of course ClearStory also empowers business analysts to make do without IT involvement, like the other cool analytic kids also do.
Comments
3 Responses to “Some stuff on my mind, September 28, 2014”
Leave a Reply
I would account EMR downfall to Cloudera. Two years ago EMR was my tool of choice when I need hadoop cluster for a few hours. Today I feel much more comfortable taking Cloudera manager to build hadoop cluster on EC2. It takes a bit more time, but I get cluster I can manage and troubleshoot – something hard to do with EMR. It is also cheaper – I do not have to pay EMR pricing overhead.
One case I see today to prefer EMR – is need for automatic cluster provisioning and re-sizing.
In the same time I think that a vendor who relay its platform on dynamic hadoop cluster provisioning would do their own scripting to have all the control.
NewSQL is looking like an increasing hard proposition. It still seems as if there’s room for a next generation SQL DBMS like FoundationDB that offers great horizontal scaling to get traction. However, it’s going to take a long time because many users are investing in other things like analytics and cloud operation instead of new transaction processing systems. Also, there are incumbents like Oracle, MySQL, and MS SQL Server that work now. Even if they don’t scale perfectly they have excellent value propositions that are continuing to improve. (Example: cloud services like Amazon RDS as well as our clusters at Continuent.)
[…] because it’s not accompanied by much in the way of programming costs, risks, or delays. Hence Rackspace’s refocus on colo at the expense of cloud. (But it can be hard on your data center […]