Theory and architecture

Analysis of design choices in databases and database management systems. Related subjects include:

Any subcategory
Database diversity
Explicit support for specific data types
(in Text Technologies) Text search

October 10, 2011

Text data management, Part 2: General and short-request

This is Part 2 of a three post series. The posts cover:

I’ve recently given widely varied advice about managing text (and similar files — images and so on), ranging from

Sure, just keep going with your old strategy of keeping .PDFs in the file system and pointing to them from the relational database. That’s an easy performance optimization vs. having the RDBMS manage them as BLOBs.

I suspect MongoDB isn’t heavyweight enough for your document management needs, let alone just dumping everything into Hadoop. Why don’t you take a look at MarkLogic?

Here are some reasons why.

There are three basic kinds of text management use case:

Text as payload.
Text as search parameter.
Text as analytic input.

Categories: MarkLogic, NoSQL, Text

5 Comments

October 10, 2011

Text data management, Part 1: Confusion

This is Part 1 of a three post series. The posts cover:

There’s much confusion about the management of text data, among technology users, vendors, and investors alike. Reasons seems to include:

The terminology around text data is inaccurate.
Data volume estimates for text are misleading.
Multiple different technologies are in the mix, including:
- Enterprise text search.
- Text analytics — text mining, sentiment analysis, etc.
- Document stores — e.g. document-oriented NoSQL, or MarkLogic.
- Log management and parsing — e.g. Splunk.
- Text archiving — e.g., various specialty email archiving products I couldn’t even name.
- Public web search — Google et al.
Text search vendors have disappointed, especially technically.
Text analytics vendors have disappointed, especially financially.
Other analytic technology vendors ignore what the text analytic vendors actually have accomplished, and reinvent inferior wheels rather than OEM the state of the art.

Above all: The use cases for text data vary greatly, just as the use cases for simply-structured databases do.

There are probably fewer people now than there were six years ago who need to be told that text and relational database management are very different things. Other misconceptions, however, appear to be on the rise. Specific points that are commonly overlooked include: Read more

Categories: Analytic technologies, Archiving and information preservation, Google, Log analysis, MarkLogic, NoSQL, Oracle, Splunk, Text

2 Comments

October 2, 2011

Defining NoSQL

A reporter tweeted: “Is there a simple plain English definition for NoSQL?” After reminding him of my cynical yet accurate Third Law of Commercial Semantics, I gave it a serious try, and came up with the following. More precisely, I tweeted the bolded parts of what’s below; the rest is commentary added for this post.

NoSQL is most easily defined by what it excludes: SQL, joins, strong analytic alternatives to those, and some forms of database integrity. If you leave all four out, and you have a strong scale-out story, you’re in the NoSQL mainstream. Read more

Categories: Cassandra, dbShards and CodeFutures, MarkLogic, MySQL, Object, Open source, Petabyte-scale data management, Schooner Information Technology

7 Comments

September 30, 2011

Oracle NoSQL is unlikely to be a big deal

Alex Williams noticed that there will be a NoSQL session at Oracle OpenWorld next week, and is wondering whether this will be a big deal. I think it won’t be.

There really are three major points to NoSQL.

Dynamic schemas. This is the only one of the three that truly depends on NoSQL.
Scale-out short-request processing. If you want to scale out efficiently at high request volumes, you’re best off not using all the flexibility SQL/relational DBMS offer. (In particular, you don’t want to do cross-node joins). Not coincidentally, a number of the best scale-out offerings were built to be NoSQL.
Open source. Doing a relational DBMS is a big project. It seems easier to build NoSQL ones.

Oracle can address the latter two points as aggressively as it wishes via MySQL. It so happens I would generally recommend MySQL enhanced by dbShards, Schooner, and/or dbShards/Schooner, rather than Oracle-only MySQL … but that’s a detail. In some form or other, Oracle’s MySQL is a huge player in the scale-out, open source, short-request database management market.

So that leaves us with dynamic schemas. Oracle has at least four different sets of technology in that area:

As Workday noticed years ago, MySQL can be used as a functional, basic key-value store.
Oracle also has XML-based Berkeley DB/SleepyCat kicking around.*
The XML extensions to Oracle’s core DBMS could be alleged to have a dynamic schema/NoSQL flavor. (Blech.)
A dynamic schema argument could also be made for object-oriented DBMS technology. While Oracle doesn’t to my knowledge exactly sell that, it does have the Tangosol Coherence line of technology, with a potentially similar programming model.

If Oracle is now refreshing and rebranding one or more of these as “NoSQL”, there’s no reason to view that as a big deal at all.

*That’s Mike Olson’s former company, if you’re keeping score at home.

Categories: MySQL, NoSQL, Object, OLTP, Open source, Oracle, Parallelization, Schooner Information Technology, Structured documents

13 Comments

September 25, 2011

Workload management and RAM

Closing out my recent round of Teradata-related posts, here’s a little anomaly:

Teradata is proud that Teradata 14’s workload management now explicitly manages I/O, to go with Teradata’s long-standing management of CPU. Teradata’s WLM still does not explicitly manage RAM.
Aster is proud that Aster 5’s workload management now explicitly manages RAM, to go along with the WLM capabilities Aster has had for a while managing CPU and I/O. Aster’s Tasso Argyros believes this is an important capability, at least in some edge cases.
Mike Pilcher of SAND emailed me that SAND’s WLM capabilities to explicitly manage CPU, I/O, and RAM are very well-received by the marketplace.

Categories: Aster Data, Data warehousing, SAND Technology, Teradata, Workload management

4 Comments

September 24, 2011

Confusion about Teradata’s big customers

Evidently further attempts to get information on this subject would be fruitless, but anyhow:

Teradata emailed me a couple of months ago saying something like that at that point they could count 16 petabyte-level customers. In response to my repeated requests for clarification, Teradata has explicitly refused to identify the metric used in reaching that conclusion.
At some point Teradata did something — as per a tweet of his — to convince Neil Raden that they have 20 petabyte-class users.
That tweet was made around the time that Teradata apparently showed a slide naming big users at the Strata conference (last week).
If Teradata is counting the way they did three years ago, that count of 16 or 20 or whatever is probably inflated compared to, say, Vertica’s figure of 7 a few months back.
Even so, it’s obvious — and not just from the eBay example — that Teradata has one of the most scalable analytic DBMS offerings around.

Categories: Petabyte-scale data management, Teradata

9 Comments

September 22, 2011

DataStax pivots back to its original strategy

The DataStax and Cassandra stories are somewhat confusing. Unfortunately, DataStax chose to clarify them in what has turned out to be a crazy news week. I’m going to use this post just to report on the status of the DataStax product line, without going into any analysis beyond that.

Categories: Cassandra, DataStax, Facebook, NoSQL, Open source

5 Comments

September 22, 2011

Hybrid-columnar soundbites

Busy couple of days talking with reporters. A few notes on hybrid-columnar analytic DBMS, all backed up by yesterday’s post on Teradata columnar:

Oracle does not actually offer columnar I/O; the other three systems do. But see the “I won’t be surprised” part in yesterday’s Teradata post.
Aster does not offer columnar compression; the other three do.
EMC Greenplum and Teradata offer different kinds of ways to mix column and row storage in the same table; each has its advantages.
Teradata generally has a more mature and capable offering than EMC Greenplum, for most purposes, whichever way you choose to organize your tables.

Edit: The Wall Street Journal got this wrong, writing that Teradata was the first-ever hybrid columnar system. Specifically, they wrote

While columnar technology has been around for years, Teradata says its product is unique because it allows users to include both columns and rows in the same database.

Googling on “Teradata To Unveil New Analytics Product To Speed Business Adoption” might get you around the paywall to see the offending piece.

Categories: Aster Data, Columnar database management, Data warehousing, Database compression, Greenplum, Teradata

2 Comments

September 22, 2011

Aster Database Release 5 and Teradata Aster appliance

It was obviously just a matter of time before there would be an Aster appliance from Teradata and some tuned bidirectional Teradata-Aster connectivity. These have now been announced. I didn’t notice anything particularly surprising in the details of either. About the biggest excitement is that Aster is traditionally a Red Hat shop, but for the purposes of appliance delivery has now embraced SUSE Linux.

Along with the announcements comes updated positioning such as:

Better SQL than the MapReduce alternatives have.
Better MapReduce than the SQL alternatives have.
Easy(ier) way to do complex analytics on multi-structured data. (Aster has embraced that term.)

and of course

Now also with Teradata’s beautifully engineered hardware and system management software!

Categories: Aster Data, Data warehouse appliances, Data warehousing, Predictive modeling and advanced analytics, Teradata, Workload management

Teradata Columnar and Teradata 14 compression

Teradata is pre-announcing Teradata 14, for delivery by the end of this year, where by “Teradata 14” I mean the latest version of the DBMS that drives the classic Teradata product line. Teradata 14’s flagship feature is Teradata Columnar, a hybrid-columnar offering that follows in the footsteps of Greenplum (now part of EMC) and Aster Data (now part of Teradata).

The basic idea of Teradata Columnar is:

Each table can be stored in Teradata in row format, column format, or a mix.
You can do almost anything with a Teradata columnar table that you can do with a row-based one.
If you choose column storage, you also get some new compression choices.

Categories: Archiving and information preservation, Columnar database management, Data warehousing, Database compression, Oracle, Rainstor, Teradata

7 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Theory and architecture

Text data management, Part 2: General and short-request

Text data management, Part 1: Confusion

Defining NoSQL

Oracle NoSQL is unlikely to be a big deal

Workload management and RAM

Confusion about Teradata’s big customers

DataStax pivots back to its original strategy

Hybrid-columnar soundbites

Aster Database Release 5 and Teradata Aster appliance

Teradata Columnar and Teradata 14 compression

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin