Data types
Analysis of data management technology optimized for specific datatypes, such as text, geospatial, object, RDF, or XML. Related subjects include:
- Any subcategory
- Database diversity
Is anybody actually using image, video, or sound indexing?
I have quite the excess of “flu-like symptoms,” and nothing substantive I’m writing today is coming to fruition. So instead of forcing the issue, I’m going to put a few questions out for discussion.
Question of the day #1
Is anybody indexing the actual contents of still images, video, or sound files?
Obviously, there are applications that serve huge numbers of videos, pictures, and/or songs — YouTube, Flickr, iTunes, and so on. But generally, these media are just handled as files or BLOBs, while all the database indexing is on alphanumeric metadata such as title, tags, uploader, date, download stats, comments, and so on.
The technology certainly exists to be more sophisticated. Consider, for example, Oracle’s Still Image datatype, which in typical Oracle fashion implements the relevant parts of SQL/MM and goes yet further. Read more
Categories: Data types, Oracle | 3 Comments |
The 4 main approaches to datatype extensibility
Based on a variety of conversations – including some of the flames about my recent confession that mid-range DBMS aren’t suitable for everything — it seems as if a quick primer may be in order on the subject of datatype support. So here goes.
“Database management” usually deals with numeric or alphabetical data – i.e., the kind of stuff that goes nicely into tables. It commonly has a natural one-dimensional sort order, which is very useful for sort/merge joins, b-tree indexes, and the like. This kind of tabular data is what relational database management systems were invented for.
But ever more, there are important datatypes beyond character strings, numbers and dates. Leaving out generic BLOBs and CLOBs (Binary/Character Large OBjects), the big four surely are:
- Text. Text search is a huge business on the web, and a separate big business in enterprises. And text doesn’t fit well into the relational paradigm at all.
- Geospatial. Information about locations on the earth’s surface is essentially two-dimensional. Some geospatial apps use three dimensions.
- Object. There are two main reasons for using object datatypes. First, the data can have complex internal structures. Second, it can comprise a variety of simpler types. Object structures are well-suited for engineering and medical applications.
- XML. A great deal of XML is, at its heart, either relational/tabular data or text documents. Still, there are a variety of applications for which the most natural datatype truly is XML.
Numerous other datatypes are important as well, with the top runners-up probably being images, sound, video, time series (even though they’re numeric, they benefit from special handling).
Four major ways have evolved to manage data of non-tabular datatype, either on their own or within an essentially relational data management environment. Read more
Categories: Data types, GIS and geospatial, Object, Structured documents, Text | 10 Comments |
Is MapReduce a good underpinning for next-gen scientific DBMS?
Back in November, Mike Stonebraker suggested that there’s a need for database management advances to serve “big science”. He said:
Obviously, the best solution to these … problems would be to put everything in a next-generation DBMS — one capable of keeping track of data, metadata, and lineage. Supporting the latter would require all operations on the data to be done inside the DBMS with user-defined functions — Postgres-style.
Categories: Data types, MapReduce, Scientific research | Leave a Comment |
Intersystems’ stealth marketing has gotten pretty extreme
Every few months I try to make contact with Intersystems. Sometimes they graciously respond, promising to schedule a briefing, which then never happens. Other times they don’t even bother. Now, on one level I can’t blame them, based on what happened at my last briefing. Read more
Categories: Intersystems and Cache', Object | 5 Comments |
Status of Software AG’s Tamino
Since I was researching Software AG anyway, I took the opportunity to ask about Software AG’s native XML DBMS Tamino, which certainly has some fans. Jim Fowler, Software AG’s Director of Market Development, Enterprise Transaction Systems, was kind enough to write up the following for me:
As you know, when Tamino was released in the late 1990s it was one of the first – if not the first – commercially available native XML database. We now have several hundred Tamino customers worldwide, and Software AG is fully committed to supporting our customers.
At the same time, we recognize that XML has matured and evolved in many different directions during the past decade; Read more
Categories: Data types, Software AG, Structured documents | Leave a Comment |
A nice EnterpriseDB replacement of MySQL
I’m going to praise EnterpriseDB’s marketing communications twice in two blog posts, because I really liked some of the crunch they put into a press release announcing a MySQL replacement at FortiusOne. To wit (emphasis mine):
The PostGIS geospatial extensions to PostgreSQL played a key role in FortiusOne’s selection of EnterpriseDB Advanced Server, a PostgreSQL-based solution, and dramatically improved performance. FortiusOne needed to run complex spatial queries against large datasets quickly and efficiently, and found the MySQL spatial extensions to be far less complete and comprehensive than PostGIS. EnterpriseDB Advanced Server processes some of GeoCommons’ database-intensive rendering requests in one-thirtieth of the time required by MySQL. During peak loads, GeoCommons processes more than one hundred thousand complex requests per hour, requiring true enterprise-class performance and scalability.
Another major factor in FortiusOne’s replacement of MySQL with EnterpriseDB Advanced Server was the company’s need for advanced partitioning, custom triggers, and functional indexing. EnterpriseDB’s advanced partitioning capabilities instantly enabled linear performance, even with tables having billions of rows.
Categories: Data types, EnterpriseDB and Postgres Plus, GIS and geospatial, MySQL | 10 Comments |
Vertica update – HP appliance deal, customer information, and more
Vertica quietly announced an appliance bundling deal with HP and Red Hat today. That got me quickly onto the phone with Vertica’s Andy Ellicott, to discuss a few different subjects. Most interesting was the part about Vertica’s customer base, highlights of which included:
- Vertica’s claim to have “50” customers includes a bunch of unpaid licenses, many of them in academia.
- Vertica has about 15 paying customers.
- Based on conversations with mutual prospects, Vertica believes that’s more customers than DATAllegro has. (Of course, each DATAllegro sale is bigger than one of Vertica’s. Even so, I hope Vertica is wrong in its estimate, since DATAllegro told me its customer count was “double digit” quite a while ago.)
- Most Vertica customers manage over 1 terabyte of user data. A couple have bought licenses showing they intend to manage 20 terabytes or so.
- Vertica’s biggest customer/application category – existing customers and sales pipelines alike – is call detail records for telecommunications companies. (Other data warehouse specialists also have activity in the CDR area.). Major applications are billing assurance (getting the inter-carrier charges right) and marketing analysis. Call center uses are still in the future.
- Vertica’s other big market to date is investment research/tick history. Surely not coincidentally, this is a big area of focus for Mike Stonebraker, evidently at both companies for which he’s CTO. (The other, of course, is StreamBase.)
- Runners-up in market activity are clickstream analysis and general consumer analytics. These seem to be present in Vertica’s pipeline more than in the actual customer base.
Categories: Analytic technologies, Business Objects, Data warehouse appliances, Data warehousing, DATAllegro, HP and Neoview, RDF and graphs, Vertica Systems | 5 Comments |
Native XML performance, and Philip Howard on recent IBM DBMS announcements
Philip Howard went to at least one conference this month I didn’t, namely IBM’s, and wrote up some highlights. As usual, he seems to have been favorably impressed.
In one note, he says that IBM is claiming a 2-5X XML performance improvement. This is a good step, since one of my clients who evaluated such engines dismissed IBM early on for being an order of magnitude too slow. That client ultimately chose Marklogic, with Cache’ having been the only other choice to make the short list.
Speaking of IBM, I flew back from the Business Objects conference next to a guy who supports IMS. He told me that IBM has bragged of an actual new customer win for IMS within the past couple of years (a large bank in China). Read more
The Netezza Developer Network
Netezza has officially announced the Netezza Developer Network. Associated with that is a set of technical capabilities, which basically boil down to programming user-defined functions or other capabilities straight onto the Netezza nodes (aka SPUs). And this is specifically onto the FPGAs, not the PowerPC processors. In C. Technically, I think what this boils down to is: Read more
Applications for not-so-low-latency CEP
The highest-profile applications for complex event/stream processing are probably the ones that require super-low latency, especially in financial trading. However, as I already noted in writing about StreamBase and Truviso, there are plenty of other CEP apps with less extreme latency requirements.
Commonly, these are data reduction apps – i.e., there’s a gushing stream of inputs, and the CEP engine filters and “enhances” it, so that only a small, modified subset is sent forward. In other cases, disk-based systems could do the job perfectly well from a performance standpoint, but the pattern matching and filtering requirements are just a better fit for the CEP paradigm.
Read more