Schema flexibility and XML data management
Conor O’Mahony, marketing manager for IBM’s DB2 pureXML, talks a lot about one of my favorite hobbyhorses — schema flexibility — as a reason to use an XML data model. In a number of industries he sees use cases based around ongoing change in the information being managed:
- Tax authorities change their rules and forms every year, but don’t want to do total rewrites of their electronic submission and processing software.
- The financial services industry keeps inventing new products, which don’t just have different terms and conditions, but may also have different kinds of terms and conditions.
- The same, to some extent, goes for the travel industry, which also keeps adding different kinds of offers and destinations.
- The energy industry keeps adding new kinds of highly complex equipment it has to manage.
Conor also thinks market evidence shows that XML’s schema flexibility is important for data interchange. For example, hospitals (especially in the US) have disparate medical records and billing systems, which can make information interchange a chore.
The second suggestion is probably the less controversial of the two. After all, everybody knows that data is very commonly exchanged in XML formats. So if it gets persisted in XML format somewhere along the way, even relational purists shouldn’t much mind, as long as it eventually gets into what they regard as a more properly structured database. (Besides — if the data is going on long, challenging, multi-stage journeys, then nobody should much blame it if it indeed wants to stop along the way somewhere and rest. 🙂 )
In the first group of examples, there’s usually also a kind of cooperation between native XML and other kinds of database managers. Before those users had access to XML, they were getting by just fine using other database technology. So XML can be used in conjunction with other systems, not as complete replacement. Even so, it’s reasonable to consider scenarios in which XML is the primary data model of record, and relational/tabular copies of the information are secondary.
For example, an income tax authority wants to store your tax form in its entirety, so that they can check both your truthfulness and your arithmetic. This is most naturally done in XML, although for many years it’s been done in relational or pre-relational technologies. They also want to aggregate a limited amount of information from each taxpayer’s form for all sorts of aggregation and administrative purposes; that’s best done in a relational database. But the part that belongs in XML is the most fundamental.
As another example, the core information of a derivatives transaction is:
- The derivatives contract (naturally stored in XML)
- The actual purchase/sale information (traditionally stored in relational systems)
- Account balances of various kinds altered by the transaction (a classic case where relational databases guarantee much-needed data integrity)
Here the majority of the basic record fits best in XML. The minority that fits best in a relational system is small enough that a good XML DBMS can probably handle it as well. Neither the superior OLTP performance nor data integrity safeguards of a relational DBMS are needed for the purchase/sale information. They are needed for the general account management – but again, that’s a relatively secondary or (no pun intended!) derivative part of the overall database.
So what we’re coming up with here is a strategy along the lines of:
- Use XML for your system of record.
- Spawn transactions in your relational/tabular data stores right away.
And by the way, while I haven’t dwelled on this – those relational/tabular data stores could be data warehouses instead of or in addition to transactional systems.
Obviously, there are two major classes of objections to this strategy (when it is contrasted with a traditional relational approach):
- Assertions that the extra programming effort needed to assure data integrity are so important as to outweigh all other consideration.
- Assertions that the need for schema flexibility isn’t really that high, or at least wouldn’t be if the enterprises’ database designers were sufficiently competent.
Well, we’ll see. So far the customer uptake for the native XML approach is small but non-zero. And thus the issue is far from being decided.
Comments
3 Responses to “Schema flexibility and XML data management”
Leave a Reply
[…] noted above, I am putting up separate posts on standards-based data interchange and schema flexibility. Share: These icons link to social bookmarking sites where readers can share and discover new […]
Hi Curt,
Since we talked, I compiled a set of thoughts on the topic at Flexible Schemas: When to Persist Data in XML Instead of Relational. There are a class of situations where the practicalities of managing data using traditional SQL types is cumbersome, and where taking advantage of flexible XML schemas for all or part of the data makes life easier.
Hi Conor,
Good link!
Which of those examples are hybrid-relational? You say explicitly that the energy one is. But IIRC from our talk, the tax and/or telecom ones are too.
Best,
CAM