May 6, 2014

Notes and comments, May 6, 2014

After visiting California recently, I made a flurry of posts, several of which generated considerable discussion.

My claim that Spark will replace Hadoop MapReduce got much Twitter attention — including some high-profile endorsements — and also some responses here.
My MemSQL post led to a vigorous comparison of MemSQL vs. VoltDB.
My post on hardware and storage spawned a lively discussion of Hadoop hardware pricing; even Cloudera wound up disagreeing with what I reported Cloudera as having said. 🙂 Sadly, there was less response to the part about the partial (!) end of Moore’s Law.
My Cloudera/SQL/Impala/Hive apparently was well-balanced, in that it got attacked from multiple sides via Twitter & email. Apparently, I was too hard on Impala, I was too hard on Hive, and I was too hard on boxes full of cardboard file cards as well.
My post on the Intel/Cloudera deal garnered a comment reminding us Dell had pushed the Intel distro.
My CitusDB post picked up a few clarifying comments.

Here is a catch-all post to complete the set.

1. The recently-announced Cloudera/MongoDB relationship* is still at the Barney stage. That said, I’m optimistic that their stated intention to add substance to the relationship will eventually come to fruition. If nothing else, the two companies have high regard for each other, at least at the Mike Olson/Max Schireson level.

*That’s one of numerous deals with my fingerprints on it, but in this case only lightly. It was probably on track to happen even without my nudges.

2. Most of what I talked about when I visited MongoDB is confidential; the public stuff was mainly in my recent MongoDB technology post. But in one exception, I asked Max for an update as to MongoDB enterprise use cases. He reported a cluster in data combination, especially but not only in use cases which have both a high-volume part and dynamic-schema aspects. Specific examples Max cited included:

Tracking financial holdings from a variety of asset classes — especially if derivatives are involved, because they have a dynamic-schema aspect.
Product catalogs, including for use on web sites.
Customer information.
Patient information.

3. I didn’t ask everybody I saw in California about business trends, and much of what we did discuss was confidential. That said:

MapR was proud of its numbers.
So was DataStax.
ClearStory has a bunch of Very Big Enterprises as customers, mainly but not only in consumer sectors (e.g. retail, packaged goods).

4. Platfora is focusing a bit, starting with clickstream and security — i.e., event series stuff. And by the way, they report that the term “event series” is working well for them.

5. I gather from a variety of comments and conversations that Amazon Redshift has achieved considerable traction.

6. Something I can’t find evidence of having posted before: I think multiple businesses monitor online sales or similar business successes as a guide to network problems. eBay did this via a custom in-memory MOLAP (Multidimensional Online Analytic Process) system years ago. Best evidence that this is hardly restricted to eBay: all the “me-too” responses I get from telling that story.

7. Citus Data tells me that as of PostgreSQL 9.4, Postgres will be able to return just the part of a JSON column needed for a query. This is as opposed to storing the whole thing as text and only retrieving it in its entirety.

8. In the comments to my “Spark on fire” post, Patrick McFadin pointed out that Mahout is transitioning from MapReduce to Spark. (All new work will be on Spark, although old MapReduce-based routines will continue to be supported.) It turns out that Derrick Harris wrote about that over a month ago, and I just missed the news.

9. Also in predictive analytics — there are rumblings that R could eventually be supplanted by Julia, although R’s massive libraries of algorithms still give it the advantage now.

10. Multiple vendors, fed up with the intermittent slowdowns from garbage collection, are moving some processing off the Java heap. Unfortunately, I neglected to ask any of them what the remaining differences then were between Java and C++ programming.

11. And to finish on a light note: BDAS — the project of which Spark is only a part — is pronounced “bad-ass”, something I first heard from Dave Patterson.

Subscribe to our complete feed!

Comments

5 Responses to “Notes and comments, May 6, 2014”

Ariel Weisberg on May 6th, 2014 10:59 am

Flexible schema has to be one of the worst and most easily co-opted differentiators that MongoDB has.

With Postgres on board it kind of surprises me that MySQL doesn’t have an answer for a flexible schema column type. It seems like everything that isn’t Postgres or MySQL (or old school RDBMS) got on the flexible schema train post haste.
Mark Callaghan on May 6th, 2014 11:47 am

MariaDB has some of it. More is on the way in all variants of MySQL. I have begun reading about the PG features and they are impressive.
Rules for names | Strategic Messaging on May 11th, 2014 1:46 am

[…] Platfora’s latest release focused on data sets that — after Platfora assembles them for you — are sort of like time series but also somewhat like event streams. “Event series” was the winning name. Edit (May 2014): Platfora reports that that choice worked out well. […]
clive boulton on May 14th, 2014 6:00 pm

On 10. Multiple vendors, fed up with the intermittent slowdowns from GC:

* download.Google.com in C++ rewritten in Go.
* office.microsoft.com jobs in C# rewritten in C++
* spacecurve.com CTO use a barrel processor in C++

Has concurrency in multi-core / multi-data center arrived?
‎
Notes on predictive modeling, October 10, 2014 | DBMS 2 : DataBase Management System Services on April 9th, 2015 2:10 am

[…] I’m not actually seeing much support for the theory that Julia will replace R except perhaps from Revolution Analytics, the company most identified with R. Go […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Notes and comments, May 6, 2014

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin