May 1, 2014

MemSQL update

I stopped by MemSQL last week, and got a range of new or clarified information. For starters:

Even though MemSQL (the product) was originally designed for OLTP (OnLine Transaction Processing), MemSQL (the company) is now focused on analytic use cases …
… which was the point of introducing MemSQL’s flash-based columnar option.
One MemSQL customer has a 100 TB “data warehouse” installation on Amazon.
Another has “dozens” of terabytes of data spread across 500 machines, which aggregate 36 TB of RAM.
At customer Shutterstock, 1000s of non-MemSQL nodes are monitored by 4 MemSQL machines.
A couple of MemSQL’s top references are also Vertica flagship customers; one of course is Zynga.
MemSQL reports encountering Clustrix and VoltDB in a few competitive situations, but not NuoDB. MemSQL believes that VoltDB is still hampered by its traditional issues — Java, reliance on stored procedures, etc.

On the more technical side: Read more

Categories: Clustering, Clustrix, Columnar database management, Data warehousing, Database compression, In-memory DBMS, MemSQL, NewSQL, NuoDB, Specific users, Vertica Systems, VoltDB and H-Store, Workload management, Zynga

18 Comments

November 8, 2013

Comments on the 2013 Gartner Magic Quadrant for Operational Database Management Systems

The 2013 Gartner Magic Quadrant for Operational Database Management Systems is out. “Operational” seems to be Gartner’s term for what I call short-request, in each case the point being that OLTP (OnLine Transaction Processing) is a dubious term when systems omit strict consistency, and when even strictly consistent systems may lack full transactional semantics. As is usually the case with Gartner Magic Quadrants:

I admire the raw research.
The opinions contained are generally reasonable (especially since Merv Adrian joined the Gartner team).
Some of the details are questionable.
There’s generally an excessive focus on Gartner’s perception of vendors’ business skills, and on vendors’ willingness to parrot all the buzzphrases Gartner wants to hear.
The trends Gartner highlights are similar to those I see, although our emphasis may be different, and they may leave some important ones out. (Big omission — support for lightweight analytics integrated into operational applications, one of the more genuine forms of real-time analytics.)

Anyhow: Read more

Categories: Actian and Ingres, Aerospike, Basho and Riak, Cassandra, Clustrix, Couchbase, DataStax, EnterpriseDB and Postgres Plus, Hadoop, HBase, IBM and DB2, In-memory DBMS, MarkLogic, McObject, Memory-centric data management, Microsoft and SQL*Server, Mid-range, NewSQL, NuoDB, OLTP, Oracle, Pricing, Progress, Apama, and DataDirect, SAP AG, Structured documents, Sybase, VoltDB and H-Store

16 Comments

September 8, 2013

Layering of database technology & DBMS with multiple DMLs

Two subjects in one post, because they were too hard to separate from each other

Any sufficiently complex software is developed in modules and subsystems. DBMS are no exception; the core trinity of parser, optimizer/planner, and execution engine merely starts the discussion. But increasingly, database technology is layered in a more fundamental way as well, to the extent that different parts of what would seem to be an integrated DBMS can sometimes be developed by separate vendors.

Major examples of this trend — where by “major” I mean “spanning a lot of different vendors or projects” — include:

The object/relational, aka universal, extensibility features developed in the 1990s for Oracle, DB2, Informix, Illustra, and Postgres. The most successful extensions probably have been:
- Geospatial indexing via ESRI.
- Full-text indexing, notwithstanding questionable features and performance.
MySQL storage engines.
MPP (Massively Parallel Processing) analytic RDBMS relying on single-node PostgreSQL, Ingres, and/or Microsoft SQL Server — e.g. Greenplum (especially early on), Aster (ditto), DATAllegro, DATAllegro’s offspring Microsoft PDW (Parallel Data Warehouse), or Hadapt.
Splits in which a DBMS has serious processing both in a “database” layer and in a predicate-pushdown “storage” layer — most famously Oracle Exadata, but also MarkLogic, InfiniDB, and others.
SQL-on-HDFS — Hive, Impala, Stinger, Shark and so on (including Hadapt).

Other examples on my mind include:

Data manipulation APIs being added to key-value stores such as Couchbase and Aerospike.
TokuMX, the Tokutek/MongoDB hybrid I just blogged about.
NuoDB’s willing reliance on third-party key-value stores (or HDFS in the role of one).
FoundationDB’s strategy, and specifically its acquisition of Akiban.

And there are several others I hope to blog about soon, e.g. current-day PostgreSQL.

In an overlapping trend, DBMS increasingly have multiple data manipulation APIs. Examples include: Read more

Categories: Aerospike, Akiban, Aster Data, Cache, Calpont, Cloudera, Data models and architecture, Database diversity, Databricks, Spark and BDAS, DATAllegro, Derived data, Greenplum, Hadapt, Hadoop, JPMorgan Chase, NoSQL, NuoDB, Parallelization, Solid-state memory, SQL/Hadoop integration, Structured documents, Text

7 Comments

April 14, 2013

Introduction to Deep Information Sciences and DeepDB

I talked Friday with Deep Information Sciences, makers of DeepDB. Much like TokuDB — albeit with different technical strategies — DeepDB is a single-server DBMS in the form of a MySQL engine, whose technology is concentrated around writing indexes quickly. That said:

DeepDB’s indexes can help you with analytic queries; hence, DeepDB is marketed as supporting OLTP (OnLine Transaction Processing) and analytics in the same system.
DeepDB is marketed as “designed for big data and the cloud”, with reference to “Volume, Velocity, and Variety”. What I could discern in support of that is mainly:
- DeepDB has been tested at up to 3 terabytes at customer sites and up to 1 billion rows internally.
- Like most other NewSQL and NoSQL DBMS, DeepDB is append-only, and hence could be said to “stream” data to disk.
- DeepDB’s indexes could at some point in the future be made to work well with non-tabular data.*
- The Deep guys have plans and designs for scale-out — transparent sharding and so on.

*For reasons that do not seem closely related to product reality, DeepDB is marketed as if it supports “unstructured” data today.

Other NewSQL DBMS seem “designed for big data and the cloud” to at least the same extent DeepDB is. However, if we’re interpreting “big data” to include multi-structured data support — well, only half or so of the NewSQL products and companies I know of share Deep’s interest in branching out. In particular:

Akiban definitely does. (Note: Stay tuned for some next-steps company news about Akiban.)
Tokutek has planted a small stake there too.
Key-value-store-backed NuoDB and GenieDB probably leans that way. (And SanDisk evidently shut down Schooner’s RDBMS while keeping its key-value store.)
VoltDB, Clustrix, ScaleDB and MemSQL seem more strictly tabular, except insofar as text search is a requirement for everybody. (Edit: Oops; I forgot about Clustrix’s approach to JSON support.)

Edit: MySQL has some sort of an optional NoSQL interface, and hence so presumably do MySQL-compatible TokuDB, GenieDB, Clustrix, and MemSQL.

Also, some of those products do not today have the transparent scale-out that Deep plans to offer in the future.

Categories: Akiban, Cloud computing, Clustrix, Columnar database management, Data models and architecture, Database compression, GenieDB, Market share and customer counts, Memory-centric data management, MySQL, NewSQL, NoSQL, NuoDB, OLTP, Oracle, ScaleDB, Schooner Information Technology, Tokutek and TokuDB, Transparent sharding, VoltDB and H-Store

YCSB benchmark notes

Two different vendors recently tried to inflict benchmarks on me. Both were YCSBs, so I decided to look up what the YCSB (Yahoo! Cloud Serving Benchmark) actually is. It turns out that the YCSB:

Was developed by — you guessed it! — Yahoo.
Is meant to simulate workloads that fetch web pages, including the writing portions of those workloads.
Was developed with NoSQL data managers in mind.
Bakes in one kind of sensitivity analysis — latency vs. throughput.
Is implemented in extensible open source code.

That actually sounds pretty good, especially the extensibility part;* it’s likely that the YCSB can be useful in a variety of product selection scenarios. Still, as recent examples show, benchmark marketing is an annoying blight upon the database industry.

*With extensibility you can test your own workloads and do your own sensitivity analyses.

A YCSB overview page features links both to the code and to the original explanatory paper. The clearest explanation of the YCSB I found there was: Read more

Categories: Aerospike, Benchmarks and POCs, NewSQL, NoSQL, NuoDB, OLTP, Yahoo

19 Comments

January 16, 2013

NuoDB marketing mishegas

I must start by apologizing for giving a quote in a press release whose contents I deplore. Unlike occasions on which I’ve posted about inaccurate quotes, in this case the fault is mine. The quote is quite accurate. And NuoDB didn’t mislead me about the release’s contents; I just neglected to ask.

NuoDB evidently subscribes to the marketing fallacy:

Big DBMS companies hit people repeatedly with marketing cudgels.
We want to be a big DBMS company.
Therefore we will hit people repeatedly with marketing cudgels too.

But to my taste, NuoDB’s worst travesty is not the deafening drumroll before launch (I asked off their mailing list months before), nor the claim that NuoDB’s launch would be a “big day” for the database industry (annoying but ordinary hype), nor the emergent flock of birds foofarah, nor even NuoDB’s overwrought benchmark marketing (distressingly many vendors do that).

Rather, I think NuoDB’s greatest marketing offense to date is its Codd-imitating “12 rules” for cloud database management. Read more

Categories: Cloud computing, Clustering, NuoDB, Theory and architecture

9 Comments

January 12, 2013