Amazon Redshift and its implications
Merv Adrian and Doug Henschen both reported more details about Amazon Redshift than I intend to; see also the comments on Doug’s article. I did talk with Rick Glick of ParAccel a bit about the project, and he noted:
- Amazon Redshift is missing parts of ParAccel, notably the extensibility framework.
- ParAccel did some engineering to make its DBMS run better in the cloud.
- Amazon did some engineering in the areas it knows better than ParAccel — cloud provisioning, cloud billing, and so on.
“We didn’t want to do the deal on those terms” comments from other companies suggest ParAccel’s main financial take from the deal is an already-reported venture investment.
The cloud-related engineering was mainly around communications, e.g. strengthening error detection/correction to make up for the lack of dedicated switches. In general, Rick seemed more positive on running in the (Amazon) cloud than analytic RDBMS vendors have been in the past.
So who should and will use Amazon Redshift? For starters, I’d say: Read more
Hadoop/RDBMS integration: Aster SQL-H and Hadapt
Two of the more interesting approaches for integrating Hadoop and MapReduce with relational DBMS come from my clients at Teradata Aster (via SQL/MR and SQL-H) and Hadapt. In both cases, the story starts:
- You can dump any kind of data you want into Hadoop’s file system.
- You can have data in a scale-out RDBMS to get good performance on analytic SQL.
- You can access all the data (not just the relationally stored part) via SQL.
- You can do MapReduce on all the data (not just the Hadoop-stored part).
- To varying degrees, Hadapt and Aster each offer three kinds of advantage over Hadoop-with-Hive:
- SQL performance is (much) better.
- SQL functionality is better.
- At least some of your employees — the “business analysts” — can invoke MapReduce processes through SQL, if somebody else (e.g. your techies or the vendor’s) coded them up in the first place.
Of course, there are plenty of differences. Those start: Read more
Categories: Aster Data, Hadapt, Hadoop, Pricing, SQL/Hadoop integration, Teradata | 5 Comments |
The Teradata Aster Big Analytics Aster/Hadoop appliance
My clients at Teradata are introducing a mix-em/match-em Aster/Hadoop box, officially called the Teradata Aster Big Analytics Appliance. Basics include:
- You can fill a rack with nodes either for the Aster DBMS or for Hadoop (Hortonworks flavor), or you can combine them in the same box.
- If you combine them, they share management software (adapted from mainstream Teradata’s) and Infiniband.
- An Aster node has 16 2.6-gigahertz cores and 24 900GB disk drives.
- A Hadoop node has 12 2.0-gigahertz cores and 12 3TB drives.
- A central part of Teradata’s strategy is that Aster and Hadoop nodes can work together via SQL-H.
- The Teradata Aster Big Analytics Appliance is based on a family of Dell servers that fit more compactly into racks than do Teradata’s traditional products.
- The Teradata Aster Big Analytics Appliance replaces a previous interim Teradata Aster appliance that used similar hardware to that in other Teradata systems.
My views on the Teradata Aster Big Analytics Appliance start: Read more
Categories: Aster Data, Hadapt, Pricing, SQL/Hadoop integration, Teradata | 3 Comments |
Aerospike, the former Citrusleaf
My new clients at Aerospike have a range of minor news to announce:
- A company and product name change (they used to be Citrusleaf).
- Some new people and funding.
- In association with an acqui-hire — of AlchemyDB guy Russ Sullivan — some unspecified future technical plans.
- A community edition (Aerospike, nee’ Citrusleaf, is closed-source).
Mainly, however, they want to call your attention to the fact that they’ve been selling a fast, reliable key-value store, with a number of production references, and want to suggest that other organizations should perhaps buy it as well.
Generally, the Aerospike product story is as I described in two posts last year. At the highest level:
- Aerospike has a key-value data model.
- Secondary indexes and so on are still futures.
- Aerospike is clustered, of course.
- Two hardware/storage choices are encouraged:
- Spinning disk, but you keep all your data in RAM.
- Solid-state disk.
AeroSpike’s three core marketing claims are performance, consistent performance, and uninterrupted operations.
- Aerospike’s performance claims are supported by a variety of blazing internal benchmarks.
- Aerospike’s consistent performance claims are along the lines of sub-millisecond latency, with 99.9% of responses being within 5 milliseconds, and even a node outage only borking performance for some 10s of milliseconds.
- Uninterrupted operation is a core AeroSpike design goal, and the company says that to date, no AeroSpike production cluster has ever gone down.
Aerospike technical details start with the expected: Read more
Categories: Aerospike, Market share and customer counts, Memory-centric data management, NoSQL, Pricing | 2 Comments |
Clustrix 4.0 and other Clustrix stuff
It feels like time to write about Clustrix, which I last covered in detail in May, 2010, and which is releasing Clustrix 4.0 today. Clustrix and Clustrix 4.0 basics include:
- Clustrix makes a short-request processing appliance.
- As you might guess from the name, Clustrix is clustered — peer-to-peer, with no head node.
- The Clustrix appliance uses flash/solid-state storage.
- Traditionally, Clustrix has run a MySQL-compatible DBMS.
- Clustrix 4.0 introduces JSON support. More on that below.
- Clustrix 4.0 introduces a bunch of administrative features, and parallel backup.
- Also in today’s announcement is a Rackspace partnership to offer Clustrix remotely, at monthly pricing.
- Clustrix has been shipping product for about 4 years.
- Clustrix has 20 customers in production, running >125 Clustrix nodes total.
- Clustrix has 60 people.
- List price for a (smallest size) Clustrix system is $150K for 3 nodes. Highest-end maintenance costs 15%.
- There’s also a $100K version meant for high availability/disaster recovery. Over half of Clustrix’s customers use off-site disaster recovery.
- Clustrix is raising a C round. Part of it has already been raised from insiders, as a kind of bridge.
The biggest Clustrix installation seems to be 20 nodes or so. Others seem to have 10+. I presume those disaster recovery customers have 6 or more nodes each. I’m not quite sure how the arithmetic on that all works; perhaps the 125ish count of nodes is a bit low.
Clustrix technical notes include: Read more
Categories: Cloud computing, Clustering, Clustrix, Database compression, Market share and customer counts, MySQL, OLTP, Pricing, Structured documents | 4 Comments |
Introduction to Neo Technology and Neo4j
I’ve been talking some with the Neo Technology/Neo4j guys, including Emil Eifrem (CEO/cofounder), Johan Svensson (CTO/cofounder), and Philip Rathle (Senior Director of Products). Basics include:
- Neo Technology came up with Neo4j, open sourced it, and is building a company around the open source core product in the usual way.
- Neo4j is a graph DBMS.
- Neo4j is unlike some other graph DBMS in that:
- Neo4j is designed for OLTP (OnLine Transaction Processing), or at least as a general-purpose DBMS, rather than being focused on investigative analytics.
- To every node or edge managed by Neo4j you can associate an arbitrary collection of (name,value) pairs — i.e., what might be called a document.
Numbers and historical facts include:
- > 50 paying Neo4j customers.
- Estimated 1000s of production Neo4j users of open source version.*
- Estimated 1/3 of paying customers and free users using Neo4j as a “system of record”.
- >30,000 downloads/month, in some sense of “download”.
- 35 people in 6 countries, vs. 25 last December.
- $13 million in VC, most of it last October.
- Started in 2000 as the underpinnings for a content management system.
- A version of the technology in production in 2003.
- Neo4j first open-sourced in 2007.
- Big-name customers including Cisco, Adobe, and Deutsche Telekom.
- Pricing of either $6,000 or $24,000 per JVM per year for two different commercial versions.
Categories: Market share and customer counts, Neo Technology and Neo4j, Open source, Pricing, RDF and graphs, Structured documents, Telecommunications | 11 Comments |
Introduction to MemSQL
I talked with MemSQL shortly before today’s launch. MemSQL technology basics are:
- In-memory relational DBMS.
- Being released single-box only. Transparent sharding is under development for release in the fall. Basic replication is under development too.
- Subset of SQL-92.
- MySQL wire-compatible (SQL coverage issues excepted).
MemSQL’s performance claims include:
- Read performance 10% or so worse than memcached.
- Write performance 20% or so better than memcached.
- 1.2 million inserts/second on a 64-core, 1/2 TB of RAM machine.
- Similarly, 1/2 billion records loaded in under 20 minutes.
MemSQL company basics include: Read more
Categories: Database compression, In-memory DBMS, Investment research and trading, Market share and customer counts, memcached, MemSQL, OLTP, Pricing, Web analytics | 3 Comments |
QlikTech bought Expressor
QlikTech has bought Expressor. Notes on that include:
- Expressor wanted to offer data integration/ETL (Extract/Transform/Load) that was all things to all people — great parallel performance, great UI, great price, etc.
- In practice, Expressor seemed to focus on cheap/easy ETL in the Microsoft Windows (I mean server) market.
- Expressor never got much traction. This seems confirmed by the “more than 20” figure for headcount mentioned in the acquisition press release.
- Both the press release and some tweets by QlikTech’s Donald Farmer seem to confirm that Expressor is being taken off the market for “boil the ocean” ETL. It will be companion technology to/integrated technology in QlikView.
- Unsurprisingly, Donald indicated that Expressor technology would expand past its Microsoft focus. (Edit: “If needed”)
Categories: Business intelligence, EAI, EII, ETL, ELT, ETLT, Expressor, Pricing, QlikTech and QlikView | 5 Comments |
Introduction to Cloudant
Cloudant is one of the few NoSQL companies with >100 paying subscription customers. For starters:
- Cloudant’s core software is a fork of CouchDB.
- Cloudant only sells you software as a service.
- More precisely, whether Cloudant offers DBaaS (DataBase as a Service) or PaaS (Platform as a Service) or a “data layer” (Cloudant’s preferred terminology) depends on your taste in buzzwords.
- I gather that Cloudant (the company) wants to handle pretty much all your data management needs. But Cloudant (the product) isn’t there yet, especially on the analytic side.
- Before CouchDB and Membase joined together, Cloudant was positioned as the big(ger) data version of CouchDB.
Company demographics include:
- Cloudant is based in Boston.
- Cloudant started out as a Y Combinator company in 2008, and “got serious” in 2009.
- Cloudant now has ~20 employees.
- Management hires include a couple of former Vertica guys.
The Cloudant guys gave me some customer counts in May that weren’t much higher than those they gave me in February, and seem to have forgotten to correct the discrepancy. Oh well. The latter (probably understated) figures included ~160 paying customers, of which:
- ~100 were multitenant.
- ~60 were single tenant.
- 1 was on-premise (but still managed by Cloudant) because of privacy concerns.
The largest Cloudant deployments seem to be in the 10s of terabytes, across a very low double digit number of servers.
Categories: Cloudant, Clustering, Couchbase, CouchDB, MapReduce, Market share and customer counts, NoSQL, Pricing, Specific users, Storage | 2 Comments |
Clarifying IBM DB2 Express-C crippleware
When Conor O’Mahony briefed me about DB2 10, he kept commenting that cool features he was talking about could be found in all editions of DB2, even the free one. So I asked what the limitations were on free DB2. He researched the matter and got back to me — and they sounded like what appeared to have been the limits when free DB2 was first introduced, over 6 years ago.
I tweeted about this, and was very fortunate that Ian Bjorhovde spoke up and said it wasn’t correct. Some scrambling ensued. It seems that the main sources of error were:
- People tend to confuse DB2 Express and DB2 Express-C; only the latter is free.
- What IBM said about the limitations DB2 Express-C upon its introduction 6 years ago should not be interpreted in line with what a plain reading might suggest.
In particular, we shouldn’t take IBM’s repeated 2006 statements that
DB2 Express-C may be deployed on … on AMD or Intel x86 systems with up to 2 dual-core chips. 4 GB of memory is the maximum supported.
to mean that you were ever allowed to use DB2 Express-C with 4 cores, nor with 4 GB of RAM.
To clarify things, Conor sent over email with permission to quote, as follows: Read more
Categories: IBM and DB2, Pricing | 14 Comments |