Data types

Analysis of data management technology optimized for specific datatypes, such as text, geospatial, object, RDF, or XML. Related subjects include:

Any subcategory
Database diversity

July 5, 2011

Eight kinds of analytic database (Part 2)

In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I’ll cover four more kinds of analytic database — even newer, for the most part, with a use case/product short list match that is even less clear. Read more

Categories: Analytic technologies, Archiving and information preservation, Business intelligence, Buying processes, Cloud computing, Columnar database management, Data mart outsourcing, Data types, Data warehouse appliances, Data warehousing, Database compression, Database diversity, EAI, EII, ETL, ELT, ETLT, Greenplum, Hadoop, Investment research and trading, Log analysis, MapReduce, MOLAP, MySQL, Netezza, NoSQL, Open source, Petabyte-scale data management, Predictive modeling and advanced analytics, Rainstor, SAND Technology, Scientific research, SenSage, Software as a Service (SaaS), Streaming and complex event processing (CEP), Telecommunications, Vertica Systems, Web analytics

6 Comments

June 24, 2011

Forthcoming Oracle appliances

Edit: I checked with Oracle, and it’s indeed TimesTen that’s supposed to be the basis of this new appliance, as per a comment below. That would be less cool, alas.

Oracle seems to have said on yesterday’s conference call Oracle OpenWorld (first week in October) will feature appliances based on Tangosol and Hadoop. As I post this, the Seeking Alpha transcript of Oracle’s call is riddled with typos. Bolded comments below are by me. Read more

Categories: Data warehouse appliances, Hadoop, In-memory DBMS, MapReduce, Memory-centric data management, Object, Oracle

8 Comments

June 20, 2011

Vertica as an analytic platform

Vertica 5.0 is coming out today, and delivering the down payment on Vertica’s analytic platform strategy. In Vertica lingo, there’s now a Vertica SDK (Software Development Kit), featuring Vertica UDT(F)s* (User-Defined Transform Functions). Vertica UDT syntax basics start: Read more

Categories: Analytic technologies, Data warehousing, GIS and geospatial, Predictive modeling and advanced analytics, RDF and graphs, Vertica Systems, Workload management

7 Comments

June 20, 2011

Temporal data, time series, and imprecise predicates

I’ve been confused about temporal data management for a while, because there are several different things going on.

Date arithmetic. This of course has been around for a very long — er, for a very long time.
Time-series-aware compression. This has been around for quite a while too.
“Time travel”/snapshotting — preserving the state of the database at previous points in time. This is a matter of exposing (and not throwing away) the information you capture via MVCC (Multi-Version Concurrency Control) and/or append-only updates (as opposed to update-in-place). Those update strategies are increasingly popular for pretty much anything except update-intensive OLTP (OnLine Transaction Processing) DBMS, so time-travel/snapshotting is an achievable feature for most vendors.
Bitemporal data access. This occurs when a fact has both a transaction timestamp and a separate validity duration. A Wikipedia article seems to cover the subject pretty well, and I touched on Teradata’s bitemporal plans back in 2009.
Time series SQL extensions. Vertica explained its version of these to me a few days ago. I imagine Sybase IQ and other serious financial-trading market players have similar features.

In essence, the point of time series/event series SQL functionality is to do SQL against incomplete, imprecise, or derived data.* Read more

Categories: Analytic technologies, Data types, Investment research and trading, Log analysis, Sybase, Telecommunications, Theory and architecture, Vertica Systems

2 Comments

June 19, 2011

Investigative analytics and derived data: Enzee Universe 2011 talk

I’ll be speaking Monday, June 20 at IBM Netezza’s Enzee Universe conference. Thus, as is my custom:

I’m posting draft slides.
I’m encouraging comment (especially in the short time window before I have to actually give the talk).
I’m offering links below to more detail on various subjects covered in the talk.

The talk concept started out as “advanced analytics” (as opposed to fast query, a subject amply covered in the rest of any Netezza event), as a lunch break in what is otherwise a detailed “best practices” session. So I suggested we constrain the subject by focusing on a specific application area — customer acquisition and retention, something of importance to almost any enterprise, and which exploits most areas of analytic technology. Then I actually prepared the slides — and guess what? The mix of subjects will be skewed somewhat more toward generalities than I first intended, specifically in the areas of investigative analytics and derived data. And, as always when I speak, I’ll try to raise consciousness about the issues of liberty and privacy, our options as a society for addressing them, and the crucial role we play as an industry in helping policymakers deal with these technologically-intense subjects.

Slide 3 refers back to a post I made last December, saying there are six useful things you can do with analytic technology:

Operational BI/Analytically-infused operational apps: You can make an immediate decision.
Planning and budgeting: You can plan in support of future decisions.
Investigative analytics (multiple disciplines): You can research, investigate, and analyze in support of future decisions.
Business intelligence: You can monitor what’s going on, to see when it necessary to decide, plan, or investigate.
More BI: You can communicate, to help other people and organizations do these same things.
DBMS, ETL, and other “platform” technologies: You can provide support, in technology or data gathering, for one of the other functions.

Slide 4 observes that investigative analytics:

Is the most rapidly advancing of the six areas …
… because it most directly exploits performance & scalability.

Slide 5 gives my simplest overview of investigative analytics technology to date: Read more

Categories: Analytic technologies, Business intelligence, Data warehousing, Derived data, EAI, EII, ETL, ELT, ETLT, GIS and geospatial, Netezza, Predictive modeling and advanced analytics, RDF and graphs, Text

4 Comments

May 29, 2011

When it’s still best to use a relational DBMS

There are plenty of viable alternatives to relational database management systems. For short-request processing, both document stores and fully object-oriented DBMS can make sense. Text search engines have an important role to play. E. F. “Ted” Codd himself once suggested that relational DBMS weren’t best for analytics.* Analysis of machine-generated log data doesn’t always have a naturally relational aspect. And I could go on with more examples yet.

*Actually, he didn’t admit that what he was advocating was a different kind of DBMS, namely a MOLAP one — but he was. And he was wrong anyway about the necessity for MOLAP. But let’s overlook those details. 🙂

Nonetheless, relational DBMS dominate the market. As I see it, the reasons for relational dominance cluster into four areas (which of course overlap):

Data re-use. Ted Codd’s famed original paper referred to shared data banks for a reason.
The benefits of normalization, which include:
- You only have to do programming work of writing something once …
- … and you don’t have to do the programming work of keeping multiple versions of the information consistent.
- You only have to do processing work of writing something once.
- You only have to buy storage to hold each fact once.
Separation of concerns.
- Different people can worry about programming and “database stuff.”
- Indeed, even performance optimization can sometimes be separated from programming (i.e., when all you have to do to get speed is implement the correct indexes).
Maturity and momentum, as reflected in the availability of:
- People.
- A broad variety of mature relational DBMS.
- Vast amounts of packaged software that “talks” SQL.

Generally speaking, I find the reasons for sticking with relational technology compelling in cases such as: Read more

Categories: Analytic technologies, Data models and architecture, Database diversity, MOLAP, NoSQL, Object, Theory and architecture

21 Comments

May 18, 2011

Starcounter high-speed memory-centric object-oriented DBMS, coming soon

Since posting recently about Starcounter, I’ve had the chance to actually talk with the company (twice). Hence I know more than before. 🙂 Starcounter:

Has been around as a company since 2006.
Has developed memory-centric object-oriented DBMS technology that has been OEMed by a few application software companies (especially in bricks-and-mortar retailing and in online advertising).
Is planning to actually launch an OODBMS product sometime this summer.
Has 14 employees (most or all of whom are in Sweden, which is also where I think Starcounter’s current customers are centered).
Is planning to shift emphasis soon to the US market.

Starcounter’s value propositions are programming ease (no object/relational impedance mismatch) and performance. Starcounter believes its DBMS has 100X the performance of conventional DBMS at short-request transaction processing, and 10X the performance of other memory-centric and/or object-oriented DBMS (e.g. Oracle TimesTen, or Versant). That said, Starcounter has not yet tested VoltDB. Starcounter does not claim performance much beyond that of disk-based DBMS on analytic tasks such as aggregations.

The key technical aspect to Starcounter is integration between the DBMS and the virtual machine, so that the same copy of the data is accessed by both the DBMS and the application program, without any movement or transformation being needed. (Starcounter isn’t aware of any other object-oriented DBMS that work this way.) Transient and persistent data are handled in the same way, seamlessly.

Other Starcounter technical highlights include: Read more

Categories: Data models and architecture, In-memory DBMS, Memory-centric data management, Object, OLTP, Starcounter, Theory and architecture

3 Comments

May 17, 2011

Terminology: poly-structured data, databases, and DBMS

My recent argument that the common terms “unstructured data” and “semi-structured data” are misnomers, and that a word like “multi-” or “poly-structured”* would be better, seems to have been well-received. But which is it — “multi-” or “poly-“?

*Everybody seems to like “poly-structured” better when it has a hyphen in it — including me. 🙂

The big difference between the two is that “multi-” just means there are multiple structures, while “poly-” further means that the structures are subject to change. Upon reflection, I think the “subject to change” part is essential, so poly-structured it is.

The definitions I’m proposing are:

A database is poly-structured to the extent that its structure is apt to be changed in the ordinary course of query, update, or programming.
Data is poly-structured to the extent that it is best represented in a poly-structured database.
A DBMS is poly-structured to the extent that it is oriented to managing poly-structured databases.

Read more

Categories: Object, Structured documents, Text, Theory and architecture

23 Comments

April 17, 2011

Netezza TwinFin i-Class overview

I have long complained about difficulties in discussing Netezza’s TwinFin i-Class analytic platform. But I’m ready now, and in the grand sweep of the product’s history I’m not even all that late. The Netezza i-Class timing story goes something like this:

Netezza i-Class was first foreshadowed in February, 2010.
Netezza i-Class customer testing started in October, 2010 or so. Netezza i-Class evidently has been shipped to 4-5 partners and a single-digit number of end-user organizations, spread across some usual-suspect industries (financial services, telecom, and so on).
Netezza i-Class 1.0 general availability is still in the (near) future.

My advice to Netezza as to how it should describe TwinFin i-Class boils down to: Read more

Categories: Cloudera, Data warehouse appliances, Data warehousing, GIS and geospatial, Hadoop, IBM and DB2, MapReduce, Netezza, Parallelization, Predictive modeling and advanced analytics

5 Comments

April 5, 2011

Whither MarkLogic?

My clients at MarkLogic have a new CEO, Ken Bado, even though former CEO Dave Kellogg was quite successful. If you cut through all the happy talk and side issues, the reason for the change is surely that the board wants to see MarkLogic grow faster, and specifically to move beyond its traditional niches of publishing (especially technical publishing) and national intelligence.

So what other markets could MarkLogic pursue? Before Ken even started work, I sent over some thoughts. They included (but were not limited to): Read more

Categories: MarkLogic, Object, RDF and graphs, Structured documents

6 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in