GIS and geospatial

Analysis of data management technology optimized for geospatial data, whether by specialized indexing or user-defined functions

January 27, 2008

The 4 main approaches to datatype extensibility

Based on a variety of conversations – including some of the flames about my recent confession that mid-range DBMS aren’t suitable for everything — it seems as if a quick primer may be in order on the subject of datatype support. So here goes.

“Database management” usually deals with numeric or alphabetical data – i.e., the kind of stuff that goes nicely into tables. It commonly has a natural one-dimensional sort order, which is very useful for sort/merge joins, b-tree indexes, and the like. This kind of tabular data is what relational database management systems were invented for.

But ever more, there are important datatypes beyond character strings, numbers and dates. Leaving out generic BLOBs and CLOBs (Binary/Character Large OBjects), the big four surely are:

Text. Text search is a huge business on the web, and a separate big business in enterprises. And text doesn’t fit well into the relational paradigm at all.
Geospatial. Information about locations on the earth’s surface is essentially two-dimensional. Some geospatial apps use three dimensions.
Object. There are two main reasons for using object datatypes. First, the data can have complex internal structures. Second, it can comprise a variety of simpler types. Object structures are well-suited for engineering and medical applications.
XML. A great deal of XML is, at its heart, either relational/tabular data or text documents. Still, there are a variety of applications for which the most natural datatype truly is XML.

Numerous other datatypes are important as well, with the top runners-up probably being images, sound, video, time series (even though they’re numeric, they benefit from special handling).

Four major ways have evolved to manage data of non-tabular datatype, either on their own or within an essentially relational data management environment. Read more

Categories: Data types, GIS and geospatial, Object, Structured documents, Text

10 Comments

December 5, 2007

A nice EnterpriseDB replacement of MySQL

I’m going to praise EnterpriseDB’s marketing communications twice in two blog posts, because I really liked some of the crunch they put into a press release announcing a MySQL replacement at FortiusOne. To wit (emphasis mine):

The PostGIS geospatial extensions to PostgreSQL played a key role in FortiusOne’s selection of EnterpriseDB Advanced Server, a PostgreSQL-based solution, and dramatically improved performance. FortiusOne needed to run complex spatial queries against large datasets quickly and efficiently, and found the MySQL spatial extensions to be far less complete and comprehensive than PostGIS. EnterpriseDB Advanced Server processes some of GeoCommons’ database-intensive rendering requests in one-thirtieth of the time required by MySQL. During peak loads, GeoCommons processes more than one hundred thousand complex requests per hour, requiring true enterprise-class performance and scalability.

Another major factor in FortiusOne’s replacement of MySQL with EnterpriseDB Advanced Server was the company’s need for advanced partitioning, custom triggers, and functional indexing. EnterpriseDB’s advanced partitioning capabilities instantly enabled linear performance, even with tables having billions of rows.

Categories: Data types, EnterpriseDB and Postgres Plus, GIS and geospatial, MySQL

10 Comments

September 27, 2007

The Netezza Developer Network

Netezza has officially announced the Netezza Developer Network. Associated with that is a set of technical capabilities, which basically boil down to programming user-defined functions or other capabilities straight onto the Netezza nodes (aka SPUs). And this is specifically onto the FPGAs, not the PowerPC processors. In C. Technically, I think what this boils down to is: Read more

Categories: Data types, Data warehouse appliances, Data warehousing, GIS and geospatial, Netezza, OLTP, Open source, PostgreSQL, SAS Institute, Structured documents, Theory and architecture

9 Comments

November 18, 2005

Two purely theoretical problems with TransRelational(TM)

There’s a vigorous discussion of TransRelational over on Alf Pedersen’s blog (Edit: Link died), although it’s completely polluted by some usual-suspects flame war BS.

Alf did poke through the dreck, however, to make a reasonable challenge, which can be paraphrased as:

OK. Suppose you’re right that no implementation has ever provided evidence of TransRelational’s usefulness for building a True Relational DBMS. It’s still theoretically fascinating.

My response was as follows:

Here are two big problems with TransRelational that are perfectly theoretical.

First, it assumes that values can be concisely stated, presumably as numbers or character strings. That isn’t a good match to complex datatypes such as, say, documents that should be full-text indexed.

Second, it assumes that there’s a natural sort order. That could be a bit of a problem even for, say, geospatial. One would think there’s a workaround in the geospatial case, e.g. like Oracle’s old hhencode. But hhencode was a fiasco, I think because it didn’t actually measure proximity very effectively.

Admittedly, both of my objections also apply to good old b-trees. Still, they speak against the potential of a TransRelational implementation to achieve the kind of generality I think modern applications do and will increasingly demand.

Basically, I think a “True Relational” DBMS that was only useful for columns with natural sort orders wouldn’t be particularly interesting. And “The Third Manifesto” notwithstanding, that’s the only kind anybody seems to have even hinted at trying to bring to market.

Categories: Data types, GIS and geospatial, Theory and architecture, TransRelational

1 Comment

← Previous Page

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in