DBMS product categories

Analysis of database management technology in specific product categories. Related subjects include:

November 19, 2012

Couchbase 2.0

My clients at Couchbase checked in.

After multiple delays, Couchbase 2.0 is well into beta, with general availability being delayed by the holiday season as much as anything else.
Couchbase (the company) now has >350 subscription customers, almost all for Couchbase (the product) — which is to say for what was known as Membase, which is basically a persistent version of Memcached.
There also are many users of open source Couchbase, most famously LinkedIn.
Orbitz is a much-mentioned flagship paying Couchbase customer.
Couchbase customers mainly seem to be replacing a caching layer, Memcached or otherwise.
Couchbase headcount is just under 100.

The big changes in Couchbase 2.0 versus the previous (1.8.x) version are:

JSON storage, including secondary indexes.
Multi-data-center replication.
A back-end change from SQLite to a heavily forked version of CouchDB, called Couchstore.

Couchbase 2.0 is upwards-compatible with prior versions of Couchbase (and hence with Memcached), but not with CouchDB.

Technology notes on Couchbase 2.0 include: Read more

Categories: Basho and Riak, Cache, Cassandra, Clustering, Couchbase, MapReduce, Market share and customer counts, MongoDB, NoSQL, Open source, Structured documents

5 Comments

November 1, 2012

Notes and comments — October 31, 2012

Time for another catch-all post. First and saddest — one of the earliest great commenters on this blog, and a beloved figure in the Boston-area database community, was Dan Weinreb, whom I had known since some Symbolics briefings in the early 1980s. He passed away recently, much much much too young. Looking back for a couple of examples — even if you’ve never heard of him before, I see that Dan ‘s 2009 comment on Tokutek is still interesting today, and so is a post on his own blog disagreeing with some of my choices in terminology.

Otherwise, in no particular order:

1. Chris Bird is learning MongoDB. As is common for Chris, his comments are both amusing and enlightening.

2. When I relayed Cloudera’s comments on Hadoop adoption, I left out a couple of categories. One Cloudera called “mobile”; when I probed, that was about HBase, with an example being messaging apps.

The other was “phone home” — i.e., the ingest of machine-generated data from a lot of different devices. This is something that’s obviously been coming for several years — but I’m increasingly getting the sense that it’s actually arrived.

Categories: Cloudera, Data integration and middleware, Hadoop, HBase, Informatica, Metamarkets and Druid, MongoDB, NoSQL, Open source, Telecommunications

2 Comments

October 24, 2012

Quick notes on Impala

Edit: There is now a follow-up post on Cloudera Impala with substantially more detail.

In my world it’s possible to have a hasty 2-hour conversation, and that’s exactly what I had with Cloudera last week. We touched on hardware and general adoption, but much of the conversation was about Cloudera Impala, announced today. Like Hive, Impala turns Hadoop into a basic analytic RDBMS, with similar SQL/Hadoop integration benefits to those of Hadapt. In particular:

Impala is Hive-compatible in query language (HQL, which is a whole lot like SQL), metadata, JDBC/ODBC drivers, etc.
Unlike Hive, Impala does not work through Hadoop MapReduce.
Unlike Hadoop MapReduce and hence Hive, Impala does not persist intermediate results to disk. This is good for performance, but on extremely long-running queries it increases the risk you’ll have a node failure and have to restart the query from scratch.
Impala in its first version is missing some Hive syntax, notably in support for UDFs (User-Defined Functions).

Beyond that: Read more

Categories: Cloudera, Columnar database management, Database compression, Hadapt, Hadoop, MapReduce, Open source, SQL/Hadoop integration

6 Comments

October 17, 2012

Notes on Hadoop hardware

I talked with Cloudera yesterday about an unannounced technology, and took the opportunity to ask some non-embargoed questions as well. In particular, I requested an update to what I wrote last year about typical Hadoop hardware.

Cloudera thinks the picture now is:

2-socket servers, with 4- or 6-core chips.
Increasing number of spindles, with 12 2-TB spindles being common.
48 gigs of RAM is most common, with 64-96 fairly frequent.
A couple of 1GigE networking ports.

Discussion around that included:

Enterprises had been running out of storage space; hence the increased amount of storage. 🙂
Even more storage can be stuffed on a node, and at times is. But at a certain point there’s so much data on a node that recovery from node failure is too forbidding.
There are some experiments with 10 GigE.

Categories: Cloudera, Data warehouse appliances, Hadoop, MapR, Solid-state memory, Storage

7 Comments

October 17, 2012

Notes on analytic hardware

I took the opportunity of Teradata’s Aster/Hadoop appliance announcement to catch up with Teradata hardware chief Carson Schmidt. I love talking with Carson, about both general design philosophy and his views on specific hardware component technologies.

From a hardware-requirements standpoint, Carson seems to view Aster and Hadoop as more similar to each other than either is to, say, a Teradata Active Data Warehouse. In particular, for Aster and Hadoop:

I/O is more sequential.
The CPU:I/O ratio is higher.
Uptime is a little less crucial.

The most obvious implication is differences in the choice of parts, and of their ratio. Also, in the new Aster/Hadoop appliance, Carson is content to skate by with RAID 5 rather than RAID 1.

I think Carson’s views about flash memory can be reasonably summarized as: Read more

Categories: Aster Data, Data warehouse appliances, Data warehousing, Hadoop, Solid-state memory, Storage, Teradata

2 Comments

October 9, 2012

IBM Pure jargon

As best I can tell, IBM now has three related families of hardware/software bundles, aka appliances, aka PureSystems, aka something that sounds like “expert system” but in fact has nothing to do with the traditional rules-engine meaning of that term. In particular,

One of the three families is for the data tier, under the name PureData. That’s what’s new today.
One of the three families is for the application tier, under the name PureApplication. More information can be found here.
One of the three families is for “infrastructure”, under the name PureFlex. More information can be found here.

Within the PureData line, there are three sub-families:

One is based on DB2 pureScale and is said to be “optimized exclusively for transactional data workloads”.
One is based on Netezza, and is said to be “optimized exclusively for analytic workloads”.
One is based on DB2 with the shared-nothing option, and is said to be “optimized exclusively for operational analytic data workloads”, notwithstanding that the underlying software has for years been IBM’s flagship general-purpose (non-mainframe) DBMS.

The Netezza part of the story seems to start:

The Netezza name is being deprecated, except insofar as certain PureData systems are “Powered by Netezza Technology.”
Netezza didn’t trumpet slipstream hardware enhancements even when it was independent, and IBM sure isn’t reversing that policy now.
The Netezza software has been enhanced, most notably in a ~20X improvement in concurrency for “tactical” queries.

Perhaps someday I’ll be able to supply interesting details, for example about the concurrency improvement or about the uses (if any) customers are finding for Netezza’s in-database analytics — but as previously noted, analyzing big companies is hard.

Categories: Data warehouse appliances, IBM and DB2, Netezza, OLTP

4 Comments

October 1, 2012

Notes on the Oracle OpenWorld Sunday keynote

I’m not at Oracle OpenWorld, but as usual that won’t keep me from commenting. My bottom line on the first night’s announcements is:

At many large enterprises, Oracle has a lock on much of their IT efforts. (But not necessarily in the internet or investigative analytics areas.) Tonight’s announcements serve to strengthen that.
Tonight’s announcements do little to help Oracle in other market segments.

In particular:

1. At the highest level, my view of Oracle’s strategy is the same as it’s been for several years:

Clayton Christensen’s The Innovator’s Solution teaches us that Oracle should focus on selling a thick stack of technology to its highest-end customers, and that’s exactly what Oracle does focus on.

2. Tonight’s news is closely in line with what Oracle’s Juan Loaiza told me three years ago, especially:

Oracle thinks flash memory is the most important hardware technology of the decade, one that could lead to Oracle being “bumped off” if they don’t get it right.

Juan believes the “bulk” of Oracle’s business will move over to Exadata-like technology over the next 5-10 years. Numbers-wise, this seems to be based more on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database management tasks.

3. Oracle is confusing people with its comments on multi-tenancy. I suspect:

What Oracle is talking about when it says “multi-tenancy” is more like consolidation than true multi-tenancy.
Probably there are a couple of true multi-tenancy features as well.

4. SaaS (Software as a Service) vendors don’t want to use Oracle, because they don’t want to pay for it.* This limits the potential impact of Oracle’s true multi-tenancy features. Even so: Read more

Categories: Business intelligence, Cloud computing, Columnar database management, Data warehouse appliances, Data warehousing, Exadata, Memory-centric data management, Oracle, Software as a Service (SaaS), Solid-state memory, Storage

9 Comments

September 24, 2012

Notes on Hadoop adoption

I successfully resisted telephone consulting while on vacation, but I did do some by email. One was on the oft-recurring subject of Hadoop adoption. I think it’s OK to adapt some of that into a post.

Notes on past and current Hadoop adoption include:

Enterprise Hadoop adoption is for experimental uses or departmental production (as opposed to serious enterprise-level production). Indeed, it’s rather tough to disambiguate those two. If an enterprise uses Hadoop to search for new insights and gets a few, is that an experiment that went well, or is it production?
One of the core internet-business use cases for Hadoop is a many-step ETL, ELT, and data refinement pipeline, with Hadoop executing some or many of the steps. But I don’t think that’s in production at many enterprises yet, except in the usual forward-leaning sectors of financial services and (we’re all guessing) national intelligence.
In terms of industry adoption:
- Financial services on the investment/trading side are all over Hadoop, just as they’re all over any technology. Ditto national intelligence, one thinks.
- Consumer financial services, especially credit card, are giving Hadoop a try too, for marketing and/or anti-fraud.
- I’m sure there’s some telecom usage, but I’m hearing of less than I thought I would. Perhaps this is because telcos have spent so long optimizing their data into short, structured records.
- Whatever consumer financial services firms do, retailers do too, albeit with smaller budgets.

Thoughts on how Hadoop adoption will look going forward include: Read more

Categories: Cloud computing, Data warehouse appliances, Data warehousing, EAI, EII, ETL, ELT, ETLT, Hadoop, Investment research and trading, Telecommunications

3 Comments

September 7, 2012

Integrated internet system design

What are the central challenges in internet system design? We probably all have similar lists, comprising issues such as scale, scale-out, throughput, availability, security, programming ease, UI, or general cost-effectiveness. Screw those up, and you don’t have an internet business.

Much new technology addresses those challenges, with considerable success. But the success is usually one silo at a time — a short-request application here, an analytic database there. When it comes to integration, unsolved problems abound.

The top integration and integration-like challenges for me, from a practical standpoint, are:

Integrating silos — a decades-old problem still with us in a big way.
Dynamic schemas with joins.
Low-latency business intelligence.
Human real-time personalization.

Other concerns that get mentioned include:

Geographical distribution due to privacy laws, which for some users is a major requirement for compliance.
Logical data warehouse, a term that doesn’t actually mean anything real.
In-memory data grids, which some day may no longer always be hand-coupled to the application and data stacks they accelerate.

Let’s skip those latter issues for now, focusing instead on the first four.

Categories: About this blog, Business intelligence, Cache, Clustering, Data integration and middleware, Data warehousing, Database diversity, EAI, EII, ETL, ELT, ETLT, Exadata, NoSQL, OLTP, Oracle, Predictive modeling and advanced analytics, SAP AG, Surveillance and privacy

6 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in