Theory and architecture

Analysis of design choices in databases and database management systems. Related subjects include:

November 3, 2011

MarkLogic’s Hadoop connector

It’s time to circle back to a subject I skipped when I otherwise wrote about MarkLogic 5: MarkLogic’s new Hadoop connector.

Most of what’s confusing about the MarkLogic Hadoop Connector lies in two pairs of options it presents you:

Otherwise, the whole thing is just what you would think:

MarkLogic said that it wrote this Hadoop connector itself.

Read more

November 2, 2011

The cool aspects of Odiago WibiData

Christophe Bisciglia and Aaron Kimball have a new company.

WibiData is designed for management of, investigative analytics on, and operational analytics on consumer internet data, the main examples of which are web site traffic and personalization and their analogues for games and/or mobile devices. The core WibiData technology, built on HBase and Hadoop,* is a data management and analytic execution layer. That’s where the secret sauce resides. Also included are:

The whole thing is in beta, with about three (paying) beta customers.

*And Avro and so on.

The core ideas of WibiData include:

Read more

October 23, 2011

NoSQL notes

Last week I visited with James Phillips of Couchbase, Max Schireson and Eliot Horowitz of 10gen, and Todd Lipcon, Eric Sammer, and Omer Trajman of Cloudera. I guess it’s time for a round-up NoSQL post. 🙂

Views of the NoSQL market horse race are reasonably consistent, with perhaps some elements of “Where you stand depends upon where you sit.”

Read more

October 23, 2011

Transparent relational OLTP scale-out

There’s a perception that, if you want (relatively) worry-free database scale-out, you need a non-relational/NoSQL strategy. That perception is false. In the analytic case it’s completely ridiculous, as has been demonstrated by Teradata, Vertica, Netezza, and various other MPP (Massively Parallel Processing) analytic DBMS vendors. And now it’s false for short-request/OLTP (OnLine Transaction Processing) use cases as well.

My favorite relational OLTP scale-out choice these days is the SchoonerSQL/dbShards partnership. Schooner Information Technology (SchoonerSQL) and Code Futures (dbShards) are young, small companies, but I’m not too concerned about that, because the APIs they want you to write to are just MySQL’s. The main scenarios in which I can see them failing are ones in which they are competitively leapfrogged, either by other small competitors – e.g. ScaleBase, Akiban, TokuDB, or ScaleDB — or by Oracle/MySQL itself. While that could suck for my clients Schooner and Code Futures, it would still provide users relying on MySQL scale-out with one or more good product alternatives.

Relying on non-MySQL NewSQL startups, by way of contrast, would leave me somewhat more concerned. (However, if their code is open sourced. you have at least some vendor-failure protection.) And big-vendor scale-out offerings, such as Oracle RAC or DB2 pureScale, may be more complex to deploy and administer than the MySQL and NewSQL alternatives.

October 20, 2011

More notes on Oracle NoSQL

A reporter asked me for some thoughts on Oracle’s new NoSQL product. For the most part, I stand by my previous comments on Oracle NoSQL. Still, NoSQL in general deserves a place in Oracle shops, so it makes sense for Oracle to try to coopt it.

Oracle’s core DBMS is not well suited to track interactions (e.g. web clicks), even in cases where it’s the choice for transactions; it’s unnecessarily heavyweight. What’s worse, using the same database to store actions and interactions can lead to serious reliability problems. If a better architecture is to dump the clicks into some NoSQL store, massage the information, and eventually put some derived data into a relational DBMS, then Oracle will naturally try to own each step of the data pipeline.

Dynamic schemas are another area of Oracle weakness, leading in some cases to outright Oracle replacements. However, pure key-value stores go too far to the opposite extreme; you should at least be able to index and retrieve data one field at a time. Based on what I’ve seen of Oracle’s marketing literature, that feature will be missing from the first release of Oracle’s NoSQL.* Until it’s in there, and until it works well, I don’t see why anybody should use Oracle’s NoSQL product.

*Frankly, that choice makes no sense to me on any level. Yet it’s the way Oracle seems to have elected to go — or, if it isn’t, then there’s somebody writing Oracle marketing collateral who’s clearly in the wrong line of work.

October 19, 2011

What those nested data structures are about

As I’ve noted before, the very big web companies have an issue with nested data structures. The subject came up in XLDB talks yesterday too, so my big goal for lunch was to finally understand what was being talked about. Sitting at a table full of eBay and LinkedIn folks turned out to be a good tactic.

The explanation was led by Oliver Ratzesberger, late of eBay* and progenitor of eBay’s Singularity project. In simplest terms, one event can spawn a lot of event attribute information, perhaps in the form of name-value pairs, which it then makes sense to store together in some way. The example Oliver dwelled on was that, on any given web page, there can be 100+ pieces of information to record, including:

*Edit: Oliver subsequently moved on to Sears and then Teradata.

There are several reasons why one might wish to store this information in ways that grieve relational purists. First, reconstructing all this information via joins would be brutally expensive. What’s more, reconstructing all this information via joins could be impractical. Some comes from third party ad servers, which might not reproduce the same ads upon demand. Other is in the form of rankings, which can’t always be reliably reproduced from one query to the next. (That’s just one of several reasons text search and relational DBMS are an awkward fit.)

Also, there’s a strong dynamic schema flavor to these databases. The list of attributes for one web click might be very different in kind from the list for the next page. Forcing that kind of variability into a fixed relational schema, while theoretically possible, doesn’t necessarily make a lot of sense.

October 18, 2011

Oracle is buying Endeca

Oracle is buying Endeca. The official talking points for the deal aren’t a perfect match for Endeca’s actual technology, but so be it.

In that post, I wrote:

… the Endeca paradigm is really to help you make your way through a structured database, where different portions of the database have different structures. Thus, at various points in your journey, it automagically provides you a list of choices as to where you could go next.

That kind of thing could help Oracle with apps like the wireless telco product catalog deal MongoDB got.

Going back to the Endeca-post quote well, Endeca itself said:

Inside the MDEX Engine there is no overarching schema; each data record carries its own metadata. This enables the rapid combination of a wide range of structured and unstructured content into Latitude’s unified data model. Once inside, the MDEX Engine derives common dimensions and metrics from the available metadata, instantly exposing each for high-performance refinement and analysis in the Discovery Framework. Have a new data source? Simply add it and the MDEX Engine will create new relationships where possible. Changes in source data schema? No problem, adjustments on the fly are easy.

And I pointed out that the MDEX engine was a columnar DBMS.

Meanwhile, Oracle’s own columnar DBMS efforts have been disappointing. Endeca could be an intended answer to that. However, while Oracle’s track record with standalone DBMS acquisitions is admirable (DEC RDB, MySQL, etc.), Oracle’s track record of integrating DBMS acquisitions into the Oracle product itself is not so good. (Express? Essbase? The text product line? None of that has gone particularly well.)

So while I would expect Endeca’s flagship e-commerce shopping engine products to flourish under Oracle’s ownership, I would be cautious about the integration of Endeca’s core technology into the Oracle product line.

October 14, 2011

Commercial software for academic use

As Jacek Becla explained:

Even so, I think that academic researchers, in the natural and social sciences alike, commonly overlook the wealth of commercial software that could help them in their efforts.

I further think that the commercial software industry could do a better job of exposing its work to academics, where by “expose” I mean:

Reasons to do so include:

Read more

October 13, 2011

Compression in Sybase ASE 15.7

Sybase recently came up with Adaptive Server Enterprise 15.7, which is essentially the “Make SAP happy” release. Features that were slated for 2012 release, but which SAP wanted, were accelerated into 2011. Features that weren’t slated for 2012, but which SAP wanted, were also brought into 2011. Not coincidentally, SAP Business Suite will soon run on Sybase Adaptive Server Enterprise 15.7.

15.7 turns out to be the first release of Sybase ASE with data compression. Sybase fondly believes that it is matching DB2 and leapfrogging Oracle in compression rate with a single compression scheme, namely page-level tokenization. More precisely, SAP and Sybase seem to believe that about compression rates for actual SAP application databases, based on some degree of testing.   Read more

October 10, 2011

Text data management, Part 3: Analytic and progressively enhanced

This is Part 3 of a three post series. The posts cover:

  1. Confusion about text data management.
  2. Choices for text data management (general and short-request).
  3. Choices for text data management (analytic).

I’ve gone on for two long posts about text data management already, but even so I’ve glossed over a major point:

Using text data commonly involves a long series of data enhancement steps.

Even before you do what we’d normally think of as “analysis”, text markup can include steps such as:

Those processes can add up to dozens of steps. And maybe, six months down the road, you’ll think of more steps yet.

Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.