Text

Analysis of data management technology optimized for text data. Related subjects include:

October 10, 2011

Text data management, Part 1: Confusion

This is Part 1 of a three post series. The posts cover:

  1. Confusion about text data management.
  2. Choices for text data management (general and short-request).
  3. Choices for text data management (analytic).

There’s much confusion about the management of text data, among technology users, vendors, and investors alike. Reasons seems to include:

Above all: The use cases for text data vary greatly, just as the use cases for simply-structured databases do.

There are probably fewer people now than there were six years ago who need to be told that text and relational database management are very different things. Other misconceptions, however, appear to be on the rise. Specific points that are commonly overlooked include: Read more

September 6, 2011

Derived data, progressive enhancement, and schema evolution

The emphasis I’m putting on derived data is leading to a variety of questions, especially about how to tease apart several related concepts:

So let’s dive in.  Read more

August 18, 2011

HP/Autonomy sound bites

HP has announced that:

On a high level, this means:

My coverage of Autonomy isn’t exactly current, but I don’t know of anything that contradicts long-time competitor* Dave Kellogg’s skeptical view of Autonomy. Autonomy is a collection of businesses involved in the management, search, and retrieval of poly-structured data, in some cases with strong market share, but even so not necessarily with the strongest of reputations for technology or technology momentum. Autonomy started from a text search engine and a Bayesian search algorithm on top of that, which did a decent job for many customers. But if there’s been much in the way of impressive enhancement over the past 8-10 years, I’ve missed the news.

*Dave, of course, was CEO of MarkLogic.

Questions obviously arise about how the Autonomy acquisition relates to other HP businesses. My early thoughts include:  Read more

June 19, 2011

Investigative analytics and derived data: Enzee Universe 2011 talk

I’ll be speaking Monday, June 20 at IBM Netezza’s Enzee Universe conference. Thus, as is my custom:

The talk concept started out as “advanced analytics” (as opposed to fast query, a subject amply covered in the rest of any Netezza event), as a lunch break in what is otherwise a detailed “best practices” session. So I suggested we constrain the subject by focusing on a specific application area — customer acquisition and retention, something of importance to almost any enterprise, and which exploits most areas of analytic technology. Then I actually prepared the slides — and guess what? The mix of subjects will be skewed somewhat more toward generalities than I first intended, specifically in the areas of investigative analytics and derived data. And, as always when I speak, I’ll try to raise consciousness about the issues of liberty and privacy, our options as a society for addressing them, and the crucial role we play as an industry in helping policymakers deal with these technologically-intense subjects.

Slide 3 refers back to a post I made last December, saying there are six useful things you can do with analytic technology:

Slide 4 observes that investigative analytics:

Slide 5 gives my simplest overview of investigative analytics technology to date:  Read more

May 17, 2011

Terminology: poly-structured data, databases, and DBMS

My recent argument that the common terms “unstructured data” and “semi-structured data” are misnomers, and that a word like “multi-” or “poly-structured”* would be better, seems to have been well-received. But which is it — “multi-” or “poly-“?

*Everybody seems to like “poly-structured” better when it has a hyphen in it — including me. 🙂

The big difference between the two is that “multi-” just means there are multiple structures, while “poly-” further means that the structures are subject to change. Upon reflection, I think the “subject to change” part is essential, so poly-structured it is.

The definitions I’m proposing are:

Read more

January 3, 2011

The six useful things you can do with analytic technology

I seem to be in the mode of sharing some of my frameworks for thinking about analytic technology. Here’s another one.

Ultimately, there are six useful things you can do with analytic technology:

Technology vendors often cite similar taxonomies, claiming to have all the categories (as they conceive them) nicely represented, in slickly integrated fashion. They exaggerate. Most of these categories are in rapid flux, and the rest should be. Analytic technology still has a long way to go.

In more detail:  Read more

November 29, 2010

Document-oriented DBMS without joins

When I talked with MarkLogic’s Ken Chestnut about MarkLogic 4.2, I was surprised to learn that MarkLogic really, truly doesn’t do anything like a join. Unlike some other non-SQL DBMS, MarkLogic has no SQL interface, no ODBC or JDBC. Nothing, nada. (MarkLogic has a Java interface for Xquery, but not for anything like SQL.)

Read more

October 12, 2010

Vertica-Hadoop integration

DBMS/Hadoop integration is a confusing subject. My post on the Cloudera/Aster Data partnership awaits some clarification in the comment thread. A conversation with Vertica left me unsure about some Hadoop/Vertica Year 2 details as well, although I’m doing better after a follow-up call. On the plus side, we also covered some rather cool Hadoop/Vertica product futures, and those seemed easier to understand. 🙂

I say “Year 2” because Hadoop/Vertica integration has been going on since last year. Indeed, Vertica says that there are now over 25 users of the Hadoop/Vertica combination and hence Vertica’s Hadoop connector. Vertica is now introducing — for immediate GA — a new version of its Hadoop connector. So far as I understood:  Read more

May 23, 2010

More on Sybase IQ, including Version 15.2

Back in March, Sybase was kind enough to give me permission to post a slide deck about Sybase IQ. Well, I’m finally getting around to doing so. Highlights include but are not limited to:

Sybase IQ may have a bit of a funky architecture (e.g., no MPP), but the age of the product and the substantial revenue it generates have allowed Sybase to put in a bunch of product features that newer vendors haven’t gotten around to yet.

More recently, Sybase volunteered permission for me to preannounce Sybase IQ Version 15.2 by a few days (it’s scheduled to come out this week). Read more

December 7, 2009

A framework for thinking about data warehouse growth

There are only three ways that the amount of data stored in data warehouses can grow:

Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.