DBMS product categories
Analysis of database management technology in specific product categories. Related subjects include:
Notes from the Couch blogs
Couchbase in general, and CouchDB project founder Damien Katz in particular, are to some extent walking away from CouchDB. That is:
- The Couchbase product will not be upward compatible with CouchDB.
- Couchbase will no longer offer a CouchDB distribution, and is doing the natural and responsible thing, namely …
- … donating to the Apache Foundation the previously proprietary aspects of that distribution.
Even so:
- All — or at least “all” — the code Couchbase offers will, at least for now, be open source.
The story unfolded in a bombshell post by Damien, and clarification follow-ups by Damien and by Couchbase CEO Bob Wiederhold. The meatiest of the three was probably Damien’s follow-up, in which he said, among other things:
Read more
Categories: Couchbase, CouchDB, Market share and customer counts, Open source | 1 Comment |
A couple of links explaining Cloudera Manager
Predictably, I wasn’t pre-briefed on the details of Oracle’s Big Data Appliance announcement today, and an inquiry to partner Cloudera doesn’t happen to have been immediately answered.* But anyhow, it’s clear from coverage by Larry Dignan and Derrick Harris that Oracle’s Big Data Appliance includes:
- Some version of Cloudera Manager (I’m guessing more or less the best one).*
- Some version of Apache Hadoop (I’m guessing the same distribution that Cloudera prefers to use).*
- Some kind of support.
In other words, it’s a lot like getting Cloudera Enterprise,* plus some hardware, plus some other stuff.
*Edit: About 2 minutes after I posted this, I got email from Cloudera CEO Mike Olson. Yes, the Oracle Big Data Appliance bundles Cloudera Enterprise.
That raises an anyway recurring question: What exactly is Cloudera Manager? Read more
Big data terminology and positioning
Recently, I observed that Big Data terminology is seriously broken. It is reasonable to reduce the subject to two quasi-dimensions:
- Bigness — Volume, Velocity, size
- Structure — Variety, Variability, Complexity
given that
- High-velocity “big data” problems are usually high-volume as well.*
- Variety, variability, and complexity all relate to the simply-structured/poly-structured distinction.
But the conflation should stop there.
*Low-volume/high-velocity problems are commonly referred to as “event processing” and/or “streaming”.
When people claim that bigness and structure are the same issue, they oversimplify into mush. So I think we need four pieces of terminology, reflective of a 2×2 matrix of possibilities. For want of better alternatives, my suggestions are:
- Relational big data is data of high volume that fits well into a relational DBMS.
- Multi-structured big data is data of high volume that doesn’t fit well into a relational DBMS. Alternative: Poly-structured big data.
- Conventional relational data is data of not-so-high volume that fits well into a relational DBMS. Alternatives: Ordinary/normal/smaller relational data.
- Smaller poly-structured data is data for which dynamic schema capabilities are important, but which doesn’t rise to “big data” volume.
Hope for a new PostgreSQL era?
In a comedy of briefing errors, I’m not too clear on the details of my client salesforce.com’s new PostgreSQL-as-a-service offering, nor exactly on what my clients at VMware are bringing to the PostgreSQL virtualization/cloud party. That said:
- PostgreSQL is good technology.
- MySQL is narrowing the gap, but PostgreSQL is still ahead of MySQL in some ways. (Database extensibility if nothing else.)
- PostgreSQL has a lot of users. (Many of them in academia and/or Russia.)
- Neither EnterpriseDB (which now calls itself “The enterprise PostgreSQL company”) nor the PostgreSQL community leadership have covered themselves with stewardship glory.
- A significant number of interesting DBMS products can be regarded as PostgreSQL forks (e.g. Greenplum, Aster Data nCluster, Netezza if you squint, and Vertica if you stand on your head*).
- PostgreSQL advancement is not dead. For example, Hadapt beta users are running actual PostgreSQL on many nodes each.
- There’s no assurance that Oracle will be a benevolent MySQL steward forever. (Specifically, Oracle’s “Play nicely with others” antitrust commitments expire in 2014.)
So I think it would be cool if one or the other big company put significant wood behind the PostgreSQL arrow.
*While Vertica was originally released using little or no PostgreSQL code — reports varied — it featured high degrees of PostgreSQL compatibility.
Categories: Aster Data, EnterpriseDB and Postgres Plus, Greenplum, MySQL, Netezza, Open source, salesforce.com, Vertica Systems | 8 Comments |
Some big-vendor execution questions, and why they matter
When I drafted a list of key analytics-sector issues in honor of look-ahead season, the first item was “execution of various big vendors’ ambitious initiatives”. By “execute” I mean mainly:
- “Deliver products that really meet customers’ desires and needs.”
- “Successfully convince them that you’re doing so …”
- “… at an attractive overall cost.”
Vendors mentioned here are Oracle, SAP, HP, and IBM. Anybody smaller got left out due to the length of this post. Among the bigger omissions were:
- salesforce.com (multiple subjects).
- SAS HPA.
- The evolution of Hadoop.
Analytic trends in 2012: Q&A
As a new year approaches, it’s the season for lists, forecasts and general look-ahead. Press interviews of that nature have already begun. And so I’m working on a trilogy of related posts, all based on an inquiry about hot analytic trends for 2012.
This post is a moderately edited form of an actual interview. Two other posts cover analytic trends to watch (planned) and analytic vendor execution challenges to watch (already up).
Clarifying SAND’s customer metrics, positioning and technical story
Talking with my clients at SAND can be confusing. That said:
- I need to revise my figures for SAND’s customer count way downward.
- SAND finally has a reasonably clear positioning.
- SAND’s product actually seems to have a lot of features.
A few months ago, I wrote:
SAND Technology reported >600 total customers, including >100 direct.
Upon talking with the company, I need to revise that figure downward, from > 600 to 15.
NoSQL notes
Last week I visited with James Phillips of Couchbase, Max Schireson and Eliot Horowitz of 10gen, and Todd Lipcon, Eric Sammer, and Omer Trajman of Cloudera. I guess it’s time for a round-up NoSQL post. 🙂
Views of the NoSQL market horse race are reasonably consistent, with perhaps some elements of “Where you stand depends upon where you sit.”
- As James tells it, NoSQL is simply a three-horse race between Couchbase, MongoDB, and Cassandra.
- Max would include HBase on the list.
- Further, Max pointed out that metrics such as job listings suggest MongoDB has the most development activity, and Couchbase/Membase/CouchDB perhaps have less.
- The Cloudera guys remarked on some serious HBase adopters.*
- Everybody I spoke with agreed that Riak had little current market presence, although some Basho guys could surely be found who’d disagree.
Categories: Basho and Riak, Cassandra, Cloudera, Clustering, Couchbase, HBase, Market share and customer counts, MongoDB, NoSQL, Open source, Oracle, Parallelization | 12 Comments |
Transparent relational OLTP scale-out
There’s a perception that, if you want (relatively) worry-free database scale-out, you need a non-relational/NoSQL strategy. That perception is false. In the analytic case it’s completely ridiculous, as has been demonstrated by Teradata, Vertica, Netezza, and various other MPP (Massively Parallel Processing) analytic DBMS vendors. And now it’s false for short-request/OLTP (OnLine Transaction Processing) use cases as well.
My favorite relational OLTP scale-out choice these days is the SchoonerSQL/dbShards partnership. Schooner Information Technology (SchoonerSQL) and Code Futures (dbShards) are young, small companies, but I’m not too concerned about that, because the APIs they want you to write to are just MySQL’s. The main scenarios in which I can see them failing are ones in which they are competitively leapfrogged, either by other small competitors – e.g. ScaleBase, Akiban, TokuDB, or ScaleDB — or by Oracle/MySQL itself. While that could suck for my clients Schooner and Code Futures, it would still provide users relying on MySQL scale-out with one or more good product alternatives.
Relying on non-MySQL NewSQL startups, by way of contrast, would leave me somewhat more concerned. (However, if their code is open sourced. you have at least some vendor-failure protection.) And big-vendor scale-out offerings, such as Oracle RAC or DB2 pureScale, may be more complex to deploy and administer than the MySQL and NewSQL alternatives.
Categories: Clustering, dbShards and CodeFutures, IBM and DB2, MySQL, NewSQL, NoSQL, OLTP, Open source, Oracle, Parallelization, Schooner Information Technology, Transparent sharding | 2 Comments |
Schooner pivots further
Schooner Information Technology started out as a complete-system MySQL appliance vendor. Then Schooner went software-only, but continued to brag about great performance in configurations with solid-state drives. Now Schooner has pivoted further, and is emphasizing high availability, clustered performance, and other hardware-agnostic OLTP (OnLine Transaction Processing) features. Fortunately, Schooner has some interesting stuff in those areas to talk about.
The short form of the SchoonerSQL (as Schooner’s product is now called) story goes roughly like this:
- SchoonerSQL replicates data — synchronously if the replication target is local, asynchronously if it is remote.
- Local synchronous replication provides high availability; remote asynchronous replication provides disaster recovery.
- SchoonerSQL’s local synchronous replication also provides read scale-out.
- Schooner has a partnership with Code Futures/dbShards to provide write scale-out via transparent sharding.
- SchoonerSQL has some secret sauce in replication performance. This has the effect of significantly increasing write performance (assuming you were going to replicate anyway), because otherwise you might have to slow down the master server’s write performance so that the slaves can keep up with it.
- Schooner believes it still has some single-server performance advantages as well.