June 28, 2008
Who is doing what in XML data management these days?
A comment thread to a post on a different subject has opened up a discussion of XML storage. Frankly, I haven’t kept up with my briefings on the subject, in part because XML support hasn’t proved to be very important yet to the big DBMS vendors, somewhat to my surprise. When last I looked, the situation wasn’t much different from what it was back in November, 2005. Unless I’ve missed something (and please tell me if I have!), here’s what’s going on:
- Almost everybody has some kind of XML datatype, and SQL extensions to permit its use.
- Almost everybody supports one or both of the two easy relational XML integrations:
- XML as a BLOb/CLOb, but with extra indexing. The disadvantage of this is that you can’t retrieve data inside a document without bringing the whole document back.
- Shredded XML. Assuming the shredding is automatic, the big disadvantage of this that performance can be lousy for complex XML that doesn’t fit naturally into tabular formats. But, a lot of XML was generated from relational databases for data interchange purposes, and in those cases shredding it back into tables may make perfect sense. (Of course, if the shredding has to be manual, there’s a nasty initial DBA burden. Also, anything that challenges performance in an RDBMS is likely to create ongoing DBA challenges.)
- The majors also claim to have more or less native XML implementations. However, to the best of my knowledge these are not oozing with customer success.
- IBM has a very nice architecture, with a separate optimized XML engine integrated into DB2. However, performance is for some reason ghastly.
- Microsoft has a BLOb/CLOb approach, but without the standard drawback of same, in that you can retrieve part of a document. But I’m not aware of happy users.
- Oracle, later to the party than IBM or Microsoft, now claims a “binary” option that sounds fairly native. (By this I mean “native” in the industry-standard sense, not in the bogus sense that Oracle used to claim.) But I’m not aware of user experiences.
- Mark Logic is the acknowledged star of XML database management.
- Intersystems Cache’, despite not being focused on XML, is one of the better-performing alternatives.
Categories: IBM and DB2, Intersystems and Cache', MarkLogic, Microsoft and SQL*Server, Oracle, Structured documents
Subscribe to our complete feed!
Comments
7 Responses to “Who is doing what in XML data management these days?”
Leave a Reply
>IBM has a very nice architecture, with a separate >optimized XML engine integrated into DB2. However, >performance is for some reason ghastly.
Curt,
Care to elaborate? “Ghastly” is certainly not an attribute we tend to attach to our XML performance.
So I am curious as to where you got that impression from.
Cheers
Serge Rielau
DB2 Benchmark & Solution Development
IBM Toronto Lab
I heard “an order of magnitude” for the performance difference from prospects who evaluated Viper vs. Marklogic, and they were surely using the phrase correctly. Cache’ was competitive; Viper wasn’t.
That was for a somewhat unusual application (tracking paths through a tree), but I’ve heard similar things elsewhere, and haven’t heard anybody speak favorably of IBM’s XML performance except IBM itself.
CAM
NY State Tax office seems to be doing fine and they made it through this season…
http://www.ibm.com/developerworks/wikis/download/attachments/3204/NYS_Tax_DB2_pureXML.pdf
I admit I have never looked at Cache’ and I didn’t even know of Mark Logic.
Have either run any public benchmarks (such as TPOX) so I can take this claim as more than hear-say?
When we compare we typically do so against Oracle an MS SQL Server and sure don’t have to hide.
DB2 9.5 btw. is about 2x above DB2 9 w.r.t. pureQML performance.
It would be interesting to know the usage scenarios you encounter. Perhaps the products simply play in different fields (exclusive XML vs. XML + relational, R/W ratios, concurrent usage, transactional semantics (?))…?
Cheers
Serge
When we talked w/ MarkLogic a few weeks back, they told us that they hadn’t performed any of the quasi “standard” XML database benchmarks (e.g., XMach-1). They told us that the reason was because these benchmarks are not indicative of the scenarios their customers are confronting.
I tend to agree w/ them and, based on my research, feel that the “standard” benchmarks are not a sufficient tool by which one can compare two XML database products to determine which is better suited to solve one’s needs.
The alternative approach which MarkLogic suggested to us, and that I also agree with, is to create a custom benchmark that aptly measure one’s own scenario. Based on the results of this measure, one is then able to make a much more informed decision about which database product best suites their needs.
DBMS performance is a complex beast. Without rigorous analysis, it is impossible to know ahead of time exactly how a DBMS will perform in your particular environment. An interesting approach is to:
1) Narrow the list of vendors you want to evaluate.
2) Download the free versions of those vendor’s DBMS and evaluate for yourself. There is nothing like a hands-on evaluation to determine performance, power, ease-of-use, and maintainability.
and
3) Make the vendors come in at this stage to help you optimize your prototype systems.
This way you can rest assured of the DBMS meeting your needs, get a feel for the system before purchase, get some valuable optimization lessons from the experts, and you will likely be ideally set up to get a quick start on deployment.
[…] named Conor O’Mahony has posted excellent comments about XML databases on a couple of DBMS2 threads. After a look at the blog URL he provided and the job description he posted there, I resolved to […]
In addition to Cache’s built-in XML support, you may want to note our add-on product eXtc which implements the W3C XML DOM directly onto its underlying database, effectively turning Cache (or the Open Source GT.M) into a Native XML Database. See http://www.mgateway.com/extc.htm and http://www.rpbourret.com/xml/ProdsNative.htm#extc
The XML DOM turns out to be tailor-made to be implemented on top of a schemaless, hierarchical database.