Reasons to use native XML
From a DevX article on Microsoft’s SQL Server 2005
Depending on your situation, XML can also be the best choice for storing even highly structured data. Here are a few practical reasons to consider storing data in a field of type XML:
* Repeated shredding or publishing—On-demand transformations carry a performance penalty. If you have to shred or publish the same document over and over again, consider storing it natively as XML. You can always expose it to relational consumers with an XML view.
* Rapidly changing data structures—When modeled correctly, XML lives up to its name: It’s extensible. Developers can add new pieces of data—even new hierarchies—to a schema without compromising existing software. Extensibility is an extra advantage when prototyping, or when working with rapidly changing problem domains such as bioinformatics.
* Atomic data—Sometimes, you’ll have XML data that’s never consumed except as a whole. Think of this as logical atomicity—if you never access the parts individually, you might as well store it in one big chunk.
* Debugging—Especially for new releases, it can be a good idea to tuck away a copy of your XML imports. The data may be redundant, but keeping the original makes tracking down problems a whole lot easier.
Nothing there to disagree with too heavily, although I can think of some other reasons that might rank higher yet.
Comments
4 Responses to “Reasons to use native XML”
Leave a Reply
Good topicd – I’d like to hear your additional reasons, but my responses to the list you presented:
This is odd to me. If you’re repeatedly shredding it, doesn’t that imply that you’re repeatedly extracting data from XML into another form, and therefore that you’d want to just keep it in that other form? For publishing, yes… but then again, this sounds like something that fits into a generic caching strategy, whether the output is HTML or XML or something else.
Agreed to some extent, but how is this different from relations, or from adding SQL tables? This talks about schema modification, not just adding new data to an XML document.
Agreed – this is just a data type in its own right, although with XML, the only operators you can really use on it (despite its “atomicity”) are generic XML ones (which violate its atomicity). Other type-definition languages have much stronger support for associating operations with types.
Agreed, although this is a good principle regardless of the form of input (XML, HTML forms, text files, URIs, etc. etc.)
– Eric
Eric,
1. Please be careful about giving a handwave and saying “Oh, general caching should take care of that.” Caching (like cost-based optimization, for that matter) is generally very simple and stupid in commercial DBMS products.
Some of my articles and posts on “memory-centric data management” address that point a little more, and I have an extensive white paper in the works on the memory-centric subject.
2. As for the repeated shredding point — if the database is taking in a lot of information that it almost never uses, I can see where storing it natively could make a lot of sense. Otherwise, your criticism seems spot on.
3. The schema variability point is hard to address in a quick note like this one. That’s because it relies on an imprecise, empirical claim along the lines of “there is or will be a significantly large set of applications in which the cost of keeping schemas updated in the conventional manner is unacceptably large.” I imagine there’s no way you’ll ever accept that claim without a persuasive set of examples (one or two examples wouldn’t suffice). We should agree to disagree for now.
I agree, but it doesn’t have to be. And I’m not saying caching is simple. But the caching needs for XML aren’t (as far as I’ve ever seen) different than those for other requests, like HTML pages.
Yes, in that case the XML value is just that – a value, something atomic to be referred to in toto. Certainly as time goes on, if “blah” comes to acquire critical information used in queries, it can be extracted and used in queries without the eccentricities of repeated XPath (and attendant parsing).
Consider the appropriate implementations for short, tabular records to be at one
end of a spectrum.
Consider the appropriate implementations for BLOBs/CLOBs with some indexing
looking insider them to be at the other end of this spectrum.
Then native XML could be said to be around the midpoint of the spectrum.
The question is whether there is a substantial body of apps for which neither
endpoint of the spectrum is good enough.
Early users of the native XML products provide persuasive evidence that
are some such apps at this time. Given the relative maturity of native XML
vs. older technologies, I think it is likely that the range of apps for which
native XML implementations are a good idea will grow.
Please note, however, that in most cases you can close your eyes, ignore the
implementation, and access the data in SQL. So the discussions about physical
and logical implementations can be at least partially separated. I know that
you and I have a sharp disagreement on the logical implementations, one that’s not
going to get resolved in my favor unless you some day become convinced of the
usefulness of variable schemas and/or the importance of sophisticated document
processing.
But you might want to reconsider your views on the physical side.