DBMS product categories
Analysis of database management technology in specific product categories. Related subjects include:
Notes on short-request scale-out MySQL
A press person recently asked about:
… start-ups that are building technologies to enable MySQL and other SQL databases to get over some of the problems they have in scaling past a certain size. … I’d like to get a sense as to whether or not the problems are as severe and wide spread as these companies are telling me? If so, why wouldn’t a customer just move to a new database?
While that sounds as if he was asking about scale-out relational DBMS in general, MySQL or otherwise, short-request or analytic, it turned out that he was asking just about short-request scale-out MySQL. My thoughts and comments on that narrower subject include(d) but are not limited to: Read more
Netezza TwinFin i-Class overview
I have long complained about difficulties in discussing Netezza’s TwinFin i-Class analytic platform. But I’m ready now, and in the grand sweep of the product’s history I’m not even all that late. The Netezza i-Class timing story goes something like this:
- Netezza i-Class was first foreshadowed in February, 2010.
- Netezza i-Class customer testing started in October, 2010 or so. Netezza i-Class evidently has been shipped to 4-5 partners and a single-digit number of end-user organizations, spread across some usual-suspect industries (financial services, telecom, and so on).
- Netezza i-Class 1.0 general availability is still in the (near) future.
My advice to Netezza as to how it should describe TwinFin i-Class boils down to: Read more
Categories: Cloudera, Data warehouse appliances, Data warehousing, GIS and geospatial, Hadoop, IBM and DB2, MapReduce, Netezza, Parallelization, Predictive modeling and advanced analytics | 5 Comments |
Unpacking the EMC Greenplum Q1 sales disaster rumors
A well-connected tipster believes:
- EMC Greenplum’s* revenue target for Q1 had been $35 million.
- Actual EMC Greenplum revenue for Q1 was $3 million, or maybe it was $8 million.
- EMC Greenplum had 75 sales teams trying to generate this revenue.
In the past I might have called Greenplum for clarification, but they’re not knocking themselves out to inform me these days, nor to inspire me with confidence in what they say. Read more
Categories: Data warehouse appliances, EMC, Greenplum | 3 Comments |
Teradata integrates in solid-state storage
For once, I think Teradata’s annual hardware refresh is pretty interesting, because of the integration of flash storage into its high-end “active enterprise data warehouse” product line. The essence of the announcement is:
- Teradata is rolling out a new appliance,* the 6680, which combines hard-disk and solid-state drives, relying on Teradata Virtual Storage.
- Teradata is also rolling out a hard-disk-based appliance,* the 6650, in a more routine annual refresh.
Categories: Data warehouse appliances, Pricing, Solid-state memory, Teradata | 3 Comments |
Revolution Analytics update
I wasn’t too impressed when I spoke with Revolution Analytics at the time of its relaunch last year. But a conversation Thursday evening was much clearer. And I even learned some cool stuff about general predictive modeling trends (see the bottom of this post).
Revolution Analytics business and business model highlights include:
- Revolution Analytics is an open-core vendor built around the R language. That is, Revolution Analytics offers proprietary code and support, with subscription pricing, that help in the use of open source software.
- Unlike most open-core vendors I can think of, Revolution Analytics takes little responsibility for the actual open source part. Some “grants” for developing certain open source R pieces seem to be the main exception. While this has caused some hard feelings, I don’t have an accurate sense for their scope or severity.
- Revolution Analytics also sells a single-user/workstation version of its product, freely admitting that this is mainly a lead generation strategy or, in my lingo, a “break-even leader.”
- Revolution Analytics boasts around 100 customers, split about 70-30 between the workstation seeding stuff and the real server product.
- Revolution Analytics has “about” 37 employees. Headquarters are at 101 University Avenue (do I have to say in what city? 🙂 ). There are also a development office in Seattle and a sales office in New York.
- Revolution Analytics’ pricing is by size of server. “Small” servers — i.e. up to 12 cores — start at $25K/year.
- Unsurprisingly, adoption is more alongside SAS et al. than rip-and-replace.
Categories: Health care, Investment research and trading, Open source, Parallelization, Predictive modeling and advanced analytics, Pricing, Revolution Analytics, SAS Institute | 2 Comments |
Comments on EMC Greenplum
I am annoyed with my former friends at Greenplum, who took umbrage at a brief sentence I wrote in October, namely “eBay has thrown out Greenplum“. Their reaction included:
- EMC Greenplum no longer uses my services.
- EMC Greenplum no longer briefs me.
- EMC Greenplum reneged on a commitment to fund an effort in the area of privacy.
The last one really hurt, because in trusting them, I put in quite a bit of effort, and discussed their promise with quite a few other people.
Some thoughts on Oracle Express Edition
I was asked by a press person about Oracle 11g Express Edition. So I might as well also share my thoughts here.
1. Oracle 11g Express Edition is seriously crippled. E.g., it’s limited to 1 GB of RAM and 11 GB of data. However …
2. … I recall when I excitedly uncovered the first 1 GB relational databases, the way I’ve uncovered petabyte-scale databases in recent years. It was less than 20 years ago. This illustrates that …
3. … the Oracle 11g Express Edition crippleware is better than what top relational database users had 20 years ago. That in turn suggests …
4. … there are plenty of businesses small enough to use Oracle 11g Express Edition for real work today.
5. Sensible reasons for having an Oracle Express Edition start with test, development, and evaluation. But there’s also market seeding — if somebody uses it for whatever reason, then either the person, the organization, or both could at some point go on to be a real Oracle customer.
By the way, allowable database size of 11 GB is up from 4 GB a few years ago. That’s like treading water. 🙂
Categories: Mid-range, Oracle | 8 Comments |
Short-request and analytic processing
A few years ago, I suggested that database workloads could be divided into two kinds — transactional and analytic. The advent of non-transactional NoSQL has suggested that we need a replacement term for “transactional” or “OLTP”, but finding one has been a bit difficult. Numerous tries, including high-volume simple processing, online request processing, internet request processing, network request processing, short request processing, and rapid request processing have turned out to be imperfect, as per discussion at each of those links. But then, no category name is ever perfect anyway. I’ve finally settled on short request processing, largely because I think it does a good job of preserving the analytic-vs-bang-bang-not-analytic workload distinction.
The easy part of the distinction goes roughly like this:
- Anything transactional or “OLTP” is short-request.
- Anything “OLAP” is analytic.
- Updates of small amounts of data are probably short-request, be they transactional or not.
- Retrievals of one or a few records in the ordinary course of update-intensive processing are probably short-request.
- Queries that return or aggregate large amounts of data — even in intermediate result sets — are probably analytic.
- Queries that would take a long time to run on badly-chosen or -configured DBMS are probably analytic (even if they run nice and fast on your actual system).
- Analytic processes that go beyond querying or simple arithmetic are — you guessed it! — analytic.
- Anything expressed in MDX is probably analytic.
- Driving a dashboard is usually analytic.
Where the terminology gets more difficult is in a few areas of what one might call real-time or near-real-time analytics. My first takes are: Read more
Categories: Analytic technologies, Data warehousing, MySQL, NoSQL, OLTP | 33 Comments |
DataStax introduces a Cassandra-based Hadoop distribution called Brisk
Cassandra company DataStax is introducing a Hadoop distribution called Brisk, for use cases that combine short-request and analytic processing. Brisk in essence replaces HDFS (Hadoop Distributed File System) with a Cassandra-based file system called CassandraFS. The whole thing is due to be released (Apache open source) within the next 45 days.
The core claims for Cassandra/Brisk/CassandraFS are:
- CassandraFS has the same interface as HDFS. So, in particular, you should be able to use most Hadoop add-ons with Brisk.
- CassandraFS has comparable performance to HDFS on sequential scans. That’s without predicate pushdown to Cassandra, which is Coming Soon but won’t be in the first Brisk release.
- Brisk/CassandraFS is much easier to administer than HDFS. In particular, there are no NameNodes, JobTracker single points of failure, or any other form of head node. Brisk/CassandraFS is strictly peer-to-peer.
- Cassandra is far superior to HBase for short-request use cases, specifically with 5-6X the random-access performance.
There’s a pretty good white paper around all this, which also recites general Cassandra claims — [edit] and here at last is the link.
Categories: Cassandra, DataStax, Hadoop, HBase, MapReduce, Open source | 3 Comments |
Hadapt (commercialized HadoopDB)
The HadoopDB company Hadapt is finally launching, based on the HadoopDB project, albeit with code rewritten from scratch. As you may recall, the core idea of HadoopDB is to put a DBMS on every node, and use MapReduce to talk to the whole database. The idea is to get the same SQL/MapReduce integration as you get if you use Hive, but with much better performance* and perhaps somewhat better SQL functionality.** Advantages vs. a DBMS-based analytic platform that includes MapReduce — e.g. Aster Data — are less clear. Read more