In-memory DBMS

Analysis of memory-centric OLTP DBMS. Related subjects include:

December 2, 2012

Are column stores really better at compression?

A consensus has evolved that:

Columnar compression (i.e., value-based compression) compresses better than block-level compression (i.e., compression of bit strings).
Columnar compression can be done pretty well in row stores.

Still somewhat controversial is the claim that:

Columnar compression can be done even better in column stores than in row-based systems.

A strong plausibility argument for the latter point is that new in-memory analytic data stores tend to be columnar — think HANA or Platfora; compression is commonly cited as a big reason for the choice. (Another reason is that I/O bandwidth matters even when the I/O is from RAM, and there are further reasons yet.)

One group that made the in-memory columnar choice is the Spark/Shark guys at UC Berkeley’s AMP Lab. So when I talked with them Thursday (more on that another time, but it sounds like cool stuff), I took some time to ask why columnar stores are better at compression. In essence, they gave two reasons — simplicity, and speed of decompression.

In each case, the main supporting argument seemed to be that finding the values in a column is easier when they’re all together in a column store. Read more

Categories: Columnar database management, Database compression, Databricks, Spark and BDAS, In-memory DBMS, Netezza

10 Comments

November 29, 2012

Notes on Microsoft SQL Server

I’ve been known to gripe that covering big companies such as Microsoft is hard. Still, Doug Leland of Microsoft’s SQL Server team checked in for phone calls in August and again today, and I think I got enough to be worth writing about, albeit at a survey level only,

Subjects I’ll mention include:

Hadoop
Parallel Data Warehouse
PolyBase
Columnar data management
In-memory data management (Hekaton)

One topic I can’t yet comment about is MOLAP/ROLAP, which is a pity; if anybody can refute my claim that ROLAP trumps MOLAP, it’s either Microsoft or Oracle.

Microsoft’s slides mentioned Yahoo refining a 6 petabyte Hadoop cluster into a 24 terabyte SQL Server “cube”, which was surprising in light of Yahoo’s history as an Oracle reference.

Categories: Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Hadoop, Hortonworks, In-memory DBMS, MapReduce, Market share and customer counts, Microsoft and SQL*Server, Oracle, Yahoo

10 Comments

November 5, 2012

Do you need an analytic RDBMS?

I can think of seven major reasons not to use an analytic RDBMS. One is good; but the other six seem pretty questionable, niche circumstances excepted, especially at this time.

The good reason to not have an analytic RDBMS is that most organizations can run perfectly well on some combination of:

SaaS (Software as a Service).
A low-volume static website.
A network focused on office software.
A single cheap server, likely running a single instance of a general-purpose RDBMS.

Those enterprises, however, are generally not who I write for or about.

The six bad reasons to not have an analytic RDBMS all take the form “Can’t some other technology do the job better?”, namely:

A data warehouse that’s just another instance of your OLTP (OnLine Transaction Processing) RDBMS. If your problem is that big, it’s likely that a specialized analytic RDBMS will be more cost-effective and generally easier to deal with.
MOLAP (Multi-Dimensional OnLine Analytic Processing). That ship has sailed … and foundered … and been towed to drydock.
In-memory BI. QlikView, SAP HANA, Oracle Exalytics, and Platfora are just four examples of many. But few enterprises will want to confine their analytics to such data as fits affordably in RAM.
Non-tabular* approaches to investigative analytics. There are many examples in the Hadoop world — including the recent wave of SQL add-ons to Hadoop — and some in the graph area as well. But those choices will rarely suffice for the whole job, as most enterprises will want better analytic SQL performance for (big) parts of their workloads.
Tighter integration of analytics and OLTP (OnLine Transaction Processing). Workday worklets illustrate that business intelligence/OLTP integration is a really good idea. And it’s an idea that Oracle and SAP can be expected to push heavily, when they finally get their product acts together. But again, that’s hardly all the analytics you’re going to want to do.
Tighter integration of analytics and other short-request processing. An example would be maintaining a casual game’s leaderboard via a NoSQL write-optimized database. Yet again, that’s hardly all the analytics a typical enterprise will want to do.

Categories: Business intelligence, Data warehousing, Games and virtual worlds, Hadoop, In-memory DBMS, MOLAP

12 Comments

August 20, 2012

In-memory, (hybrid) memory-centric DBMS — three analytic glossary draft entries

These are three closely-related draft entries for the DBMS2 analytic glossary. Please comment with any ideas you have for their improvement!

1. We coined the term memory-centric data management to comprise several kinds of technology that manage data in RAM (Random Access Memory), including:

In-memory DBMS (DataBase Management Systems).
Hybrid memory-centric DBMS.
Other kinds of in-memory data stores, such as:
- Caching layers.
- In-memory data stores that are tightly tied to specific analytic tools, for example the in-memory data management part of QlikView.
Complex event/stream processing.

Related link

Many examples of memory-centric data management (April, 2012)

2. An in-memory DBMS is a DBMS designed under the assumption that substantially all database operations will be performed in RAM (Random Access Memory). Thus, in-memory DBMS form a subcategory of memory-centric data management systems.

Ways in which in-memory DBMS are commonly different from those that query and update persistent storage include: Read more

Categories: Analytic glossary, Cache, In-memory DBMS, Memory-centric data management, Streaming and complex event processing (CEP)

7 Comments

July 17, 2012

Why I recommend avoiding Kognitio

Since my recent post about Kognitio, things have gotten worse. The company is insistently pushing the marketing message that Kognitio has always been an in-memory product, and at one point went so far as to publicly pretend that I had agreed.

I do not agree. Yes, it’s fair to say — as I did in 2008 — that Kognitio is very RAM-centric, but that’s not at all the same thing. In particular:

I did due diligence for Warburg Pincus’ original investment in Kognitio in the 1990s (it was then called White Cross). I have no memory of an in-memory positioning, nor of discussing same with anybody.
I checked my notes from a 2006 briefing, which included Kognitio CTO Roger Gaskell. There was no claim that Kognitio was an in-memory product.
Indeed, as I also posted in 2008, Kognitio keeps indexes on disk. If you use indexes on disk, you’re not an in-memory product.

The truth is that Kognitio offers a disk-based DBMS that has long been worked on by a small team. I believe that the team really has put considerable effort into how Kognitio uses RAM. But there’s no basis to give Kognitio credit for being “really” in-memory vs. a variety of other analytic RDBMS alternatives. And a row-based product that doesn’t currently offer compression is at a large disadvantage versus, say, columnar products that already do.*

*Columnar systems don’t clobber row-based ones in-memory as extremely as they do in some disk-based use cases. But even in-memory it’s good not to have to move around data that isn’t relevant to your query.

Until Kognitio gets at least somewhat more honest in its marketing, I recommend avoiding Kognitio like the plague. It’s simply not a big enough company to buy from unless you have some level of trust in the management team.

Categories: Columnar database management, Database compression, In-memory DBMS, Kognitio, Memory-centric data management

1 Comment

July 2, 2012

Introduction to Yarcdata

Cray’s strategy these days seems to be:

Move forward with the classic supercomputer business.
Diversify into related areas.

At the moment, the main diversifications are:

Boxes that are like supercomputers, but at a lower price point.
Storage.
“(Big) data”.

The last of the three is what Cray subsidiary Yarcdata is all about. Read more

Categories: Data models and architecture, Health care, In-memory DBMS, Investment research and trading, Market share and customer counts, Parallelization, Petabyte-scale data management, RDF and graphs, Yarcdata and Cray

1 Comment

June 18, 2012

Introduction to MemSQL

I talked with MemSQL shortly before today’s launch. MemSQL technology basics are:

In-memory relational DBMS.
Being released single-box only. Transparent sharding is under development for release in the fall. Basic replication is under development too.
Subset of SQL-92.
MySQL wire-compatible (SQL coverage issues excepted).

MemSQL’s performance claims include:

Read performance 10% or so worse than memcached.
Write performance 20% or so better than memcached.
1.2 million inserts/second on a 64-core, 1/2 TB of RAM machine.
Similarly, 1/2 billion records loaded in under 20 minutes.

MemSQL company basics include: Read more

Categories: Database compression, In-memory DBMS, Investment research and trading, Market share and customer counts, memcached, MemSQL, OLTP, Pricing, Web analytics

3 Comments

April 7, 2012

Many kinds of memory-centric data management

I’m frequently asked to generalize in some way about in-memory or memory-centric data management. I can start:

The desire for human real-time interactive response naturally leads to keeping data in RAM.
Many databases will be ever cheaper to put into RAM over time, thanks to Moore’s Law. (Most) traditional databases will eventually wind up in RAM.
However, there will be exceptions, mainly on the machine-generated side. Where data creation and RAM data storage are getting cheaper at similar rates … well, the overall cost of RAM storage may not significantly decline.

Getting more specific than that is hard, however, because:

The possibilities for in-memory data storage are as numerous and varied as those for disk.
The individual technologies and products for in-memory storage are much less mature than those for disk.
Solid-state options such as flash just confuse things further.

Consider, for example, some of the in-memory data management ideas kicking around. Read more

Categories: Business intelligence, Cache, Cognos, Columnar database management, Couchbase, Data models and architecture, Data warehousing, Database diversity, Exasol, IBM and DB2, In-memory DBMS, Kognitio, memcached, MongoDB, MySQL, NoSQL, Oracle, Oracle TimesTen, ParAccel, QlikTech and QlikView, SAP AG, solidDB, Streaming and complex event processing (CEP), VoltDB and H-Store, Workday

15 Comments

March 21, 2012

Comments on Oracle’s third quarter 2012 earnings call

Various reporters have asked me about Oracle’s third quarter 2012 earnings conference call. Specific Q&A includes:

What did Oracle do to have its earnings beat Wall Street’s estimates?

Have a bad second quarter and then set Wall Street’s expectations too low for Q3. This isn’t about strong results; it’s about modest expectations.

Can Oracle be a leader in both hardware and software?

It’s not inconceivable.
The observation that Oracle, IBM, and Teradata all are pushing hardware-software combinations has been intriguing ever since IBM bought Netezza. (SAP really isn’t, however; ditto Microsoft.)
I do think Oracle may be somewhat overoptimistic as to how cooperative the Sun user base will be in buying more high-end product and in paying more in maintenance for the gear they already have.

Beyond that, please see below.

What about Oracle in the cloud?

MySQL is an important cloud supplier. But Oracle overall hasn’t demonstrated much understanding of what cloud technology and business are all about. An expensive SaaS acquisition here or there could indeed help somewhat, but it seems as if Oracle still has a very long way to go.

Other comments

Other comments on the call, whose transcript is available, include: Read more

Categories: Cloud computing, Exadata, Humor, In-memory DBMS, Oracle, SAP AG, Software as a Service (SaaS)

5 Comments

February 26, 2012

SAP HANA today

SAP HANA has gotten much attention, mainly for its potential. I finally got briefed on HANA a few weeks ago. While we didn’t have time for all that much detail, it still might be interesting to talk about where SAP HANA stands today.

The HANA section of SAP’s website is a confusing and sometimes inaccurate mess. But an IBM whitepaper on SAP HANA gives some helpful background.

SAP HANA is positioned as an “appliance”. So far as I can tell, that really means it’s a software product for which there are a variety of emphatically-recommended hardware configurations — Intel-only, from what right now are eight usual-suspect hardware partners. Anyhow, the core of SAP HANA is an in-memory DBMS. Particulars include:

Mainly, HANA is an in-memory columnar DBMS, based on SAP’s confusingly-renamed BI Accelerator/BW Accelerator. Analytics and most OLTP (OnLine Transaction Processing) go against the columnar part of HANA.
The HANA DBMS also has an in-memory row storage option, used to store metadata, small tables, and so on.
SAP HANA talks both SQL and MDX.
The HANA DBMS is shared-nothing across blades or rack servers. I imagine that within an individual blade it’s shared everything. The usual-suspect data distribution or partitioning strategies are available — hash, range, round-robin.
SAP HANA has what sounds like a natural disk-based persistence strategy — logs, snapshots, and so on. SAP says that this is synchronous enough to give ACID compliance. For some hardware partners, those “disks” are actually Fusion I/O cards.
HANA is fault-tolerant “across servers”.
Text support is “coming soon”, which makes sense, given that BI Accelerator was based on the TREX search engine in the first place. Inxight is also in the HANA text mix.
You can put data into SAP HANA in a variety of obvious ways:
- Writing it directly.
- Trigger-based replication (perhaps from the DBMS that runs your SAP apps).
- Log-based replication (based on Sybase Replication Server).
- SAP Business Objects’ ETL tool.

SAP says that the row-store part is based both on P*Time, an acquisition from Korea some time ago, and also on SAP’s own MaxDB. The IBM white paper mentions only the MaxDB aspect. (Edit: Actually, see the comment thread below.) Based on a variety of clues, I conjecture that this was an aspect of SAP HANA development that did not go entirely smoothly.

Other SAP HANA components include: Read more

Categories: Analytic technologies, Business Objects, Data warehouse appliances, Data warehousing, In-memory DBMS, Investment research and trading, OLTP, Predictive modeling and advanced analytics, SAP AG, Software as a Service (SaaS), Text

12 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in