Microsoft and SQL*Server

Microsoft’s efforts in the database management, analytics, and data connectivity markets. Related subjects include:

October 3, 2006

Vendor segmentation for data warehouse DBMS

February, 2011 edit: I’ve now commented on Gartner’s 2010 Data Warehouse Database Management System Magic Quadrant as well.

Several vendors are offering links to Gartner’s new Magic Quadrant report on data warehouse DBMS. (Edit: This is now a much better link to the 2006 MQ.) Somewhat atypically for Gartner, there’s a strict hierarchy among most of the vendors, with Teradata > IBM > Oracle > Microsoft > Sybase > Kognitio > MySQL > Sand, in each case on both axes of the matrix. The only two exceptions are Netezza and DATallegro, which are depicted as outvisioning Microsoft somewhat even as they trail both Microsoft and Sybase in execution.

Gartner Magic Quadrants tend to annoy me, and I’m not going to critique the rankings in detail. But I do think this particular MQ is helpful in framing a vendor segmentation, namely:

  1. Big full-spectrum MPP/shared-nothing vendors: Teradata and IBM.
  2. MPP/shared-nothing appliance upstarts: Netezza and DATallegro
  3. Big SMP/shared-everything vendors who also are apt to be your OLTP incumbent, and who want to integrate your software stack soup-to-nuts: Oracle and Microsoft
  4. Niche vendors: Pretty much everybody else

Read more

September 27, 2006

Oracle and Microsoft in data warehousing

Most of my recent data warehouse engine research has been with the specialists. But over the past couple of days I caught up with Oracle and Microsoft (IBM is scheduled for Friday). In at least three ways, it makes sense to lump those vendors together, and contrast them with the newer data warehouse appliance startups:

  1. Shared-everything architecture
  2. End-to-end solution story
  3. OLTP industrial-strengthness carried over to data warehousing

In other ways, of course, their positions are greatly different. Oracle may have a full order-of-magnitude lead on Microsoft in warehouse sizes, for example, and has a broad range of advanced features that Microsoft either hasn’t matched yet, or else just released in SQL Server 2005. Microsoft was earlier in pushing DBA ease as a major product design emphasis, although Oracle has played vigorous catch-up in Oracle10g.

Read more

July 9, 2006

OS-DBMS integration

A Slashdot thread tonight on the possibility of Oracle directly supporting Linux got me thinking – integration of DBMS and OS is much more common than one might at first realize, especially least in high-end data warehousing.

Think about it.

This trend isn’t quite universal, of course. Open systems DB2 and Sybase and Progress and MySQL and so on are quite OS-independent, and of course you could dispute my characterization of Oracle as being “integrated” with the underlying OS. But in performance-critical environments, DBMS are often intensely OS-aware.

And of course this dovetails with a point I noted in another thread – DBMS are (or need to become) increasingly aware of chip architecture details as well.

April 10, 2006

IBM’s definition of native XML

IBM’s recent press release on Viper says:

Viper is expected to be the only database product able to seamlessly manage both conventional relational data and pure XML data without requiring the XML data to be reformatted or placed into a large object within the database.

That, so far as I know, is true, at least among major products.

I’m willing to apply the “native” label to Microsoft’s implementation anyway, because conceptually there’s little or no necessary performance difference between their approach and IBM’s. (Dang. I thought I posted more details on that months ago. I need to remedy the lack soon.)

As for Oracle — well, right now Oracle has a bit of a competitive problem

January 26, 2006

More on the inventory database example

In my recent column on XML storage, I referenced a Microsoft-provided example of an inventory database. A retailer (I think an online one) wanted to manage books and DVDs and so on, and search across attributes that we common to the different entity kinds, such as title.

Obviously, there are relational alternatives. Items have unique SKU numbers, and they have one of a limited number of kinds, and a set of integrity constraints could mandate that an item was listed in the appropriate table for its kind and no other, and then common attributes could be search on via views that amounted to unions (or derived tables kept synchronized via their own integrity constraints).

I pushed back at Microsoft — which is, you may recall, not just an XML advocate but also one of the largest RDBMS vendors — with this kind of reasoning, and they responded with the following, which I just decided to (with permission) post verbatim.

“If all you ever do is manage books and DVDs, then managing them relationally works well, especially if their properties do not change. However, you many want to add CDs and MP3 on memory cards and many other items that all have different properties. Then you quickly run into an administration overhead and may not be able to keep up with your schema evolution (and you need an additional DBA for managing the complex relational schema). Even if you use a relational approach that stores common properties in joint tables, the recomposition costs of the information for one item may become too expensive to bear.”

December 15, 2005

Application logic in the database

I’m highly in favor of modularity in application development, but suspicious of folks who promote it to extremes as a panacea. (Perhaps another legacy of my exaggerated infatuation with LISP in the 1980s?) Thus, I was one of the chief drumbeaters for OO programming before Java made it de rigeur, but I also was one of the chief mockers of Philippe Kahn’s claims that Borland would outdevelop Microsoft in office productivity tools just because it used OO tools. (Analyst Michelle Preston bought that pitch lock, stock, and barrel, and basically was never heard from again.)

I’ve held similar views on stored procedures. A transactional DBMS without stored procedures is for many purposes not a serious product. CASE tools that use stored procedures to declaratively implement integrity constraints have been highly valuable for a decade. But more general use of stored procedures has been very problematic, due to the lack of development support for writing and maintaining them in any comprehensive way. Basically, stored procedures have been database-resident spaghetti.

Microsoft claims to have changed all this with the relationship between the new releases of SQL Server and Visual Studio, and have touted this as one of the few “game changers” in SQL Server 2005. I haven’t actually looked at their offering, but I’m inclined to give them the benefit of the doubt — i.e., absent verification I tentatively believe they are making it almost as practical from a team development standpoint to implement code in the database as it is on the middle tier.

Between the Microsoft announcement and the ongoing rumblings of the business rules folks, there’s considerable discussion of putting application logic in the database, including by the usual suspects over on Alf Pedersen’s blog. (Eric’s response in that thread is particularly good.) Here are some of my thoughts:

1. As noted above, putting logic in the database, to the extent the tools are good, has been a good thing. If the tools are indeed better now, it may become a better thing.

2. The myth that an application is just database-logic-plus-the-obvious-UI has been with us for a LONG time. It’s indeed a myth, for several reasons. There’s business process, for one thing. For another, UIs aren’t as trivial as that story would make them sound. (I keep promising to write on the UI point and never get around to it. I will. Stay tuned. For one thing, I have a white paper in the works on portals. For another, I’m not writing enough about analytics, and UI is one of the most interesting things going in analytics these days.) Plus there are many apps for which a straightforward relational/tabular database design doesn’t make sense anyway. (That’s a primary theme of this blog.)

3. It’s really regrettable that the term “business rules” is used so carelessly. It conflates integrity constraints and general application logic. Within application logic, it conflates those which are well served by a development and/or implementation paradigm along the line of a rules engine, and those for which a rules engine would make little sense. It’s just bad semantics.

4. Besides everything else, I mainly agree with SAP’s belief that the DBMS is the wrong place to look for module interfaces.

December 14, 2005

Reasons to use native XML

From a DevX article on Microsoft’s SQL Server 2005

Depending on your situation, XML can also be the best choice for storing even highly structured data. Here are a few practical reasons to consider storing data in a field of type XML:

* Repeated shredding or publishing—On-demand transformations carry a performance penalty. If you have to shred or publish the same document over and over again, consider storing it natively as XML. You can always expose it to relational consumers with an XML view.
* Rapidly changing data structures—When modeled correctly, XML lives up to its name: It’s extensible. Developers can add new pieces of data—even new hierarchies—to a schema without compromising existing software. Extensibility is an extra advantage when prototyping, or when working with rapidly changing problem domains such as bioinformatics.
* Atomic data—Sometimes, you’ll have XML data that’s never consumed except as a whole. Think of this as logical atomicity—if you never access the parts individually, you might as well store it in one big chunk.
* Debugging—Especially for new releases, it can be a good idea to tuck away a copy of your XML imports. The data may be redundant, but keeping the original makes tracking down problems a whole lot easier.

Nothing there to disagree with too heavily, although I can think of some other reasons that might rank higher yet.

December 12, 2005

Two kinds of DBMS extensibility

Microsoft took slight exception to my claim that they lack fully general DBMS extensibility. The claim is actually correct, but perhaps it could lead to confusion. And anyhow there’s a distinction here worth drawing, namely:

There are two different kinds of DBMS extensibility.

The first one, which Microsoft has introduced in SQL Server 2005 (but which other vendors have had for many years) is UDTs (User-Defined Types), sometimes in other systems called user-defined functions. These are in essence datatypes that are calculated functions of existing datatypes. You could use a UDT, for example, to make the NULLs in SQL go away, if you hate them. Or you can calculate bond interest according to the industry-standard “360 day year.” Columns of these datatypes can be treated just like other columns — one can use them in joins, one can index on them, the optimizer can be aware of them, etc.

The second one, commonly known by the horrible name of abstract datatypes (ADTs), is found mainly in Oracle, DB2, and previously the Informix/Illustra products. Also, if my memory is accurate, Ingres has a very partial capability along those lines, and PostgresSQL is said to be implementing them too. ADTs offer a way to add totally new datatypes into a relational system, with their own data access methods (e.g., index structures). That’s how a DBMS can incorporate a full-text index, or a geospatial datatype. It can also be a way to more efficiently implement something that would also work as a UDT.

In theory, Oracle et al. expose the capability to users to create ADTs. In practice, you need to be a professional DBMS developer to write them, and they done either by the DBMS vendors themselves, or by specialist DBMS companies. E.g., much geospatial data today is stored in ESRI add-ons to Oracle; ESRI of course offered a speciality geospatial DBMS before ADTs were on the market.

Basically, implementing a general ADT capability is a form of modularity that lets new datatypes be added more easily than if you don’t have it. But it’s not a total requirement for new datatypes. E.g., I was wrong about Microsoft’s native XML implementation; XML is actually managed in the relational system. (More on that in a subsequent post.)

November 17, 2005

Native XML storage, Part 1 (technology)

IBM’s “Viper” version of DB2 is in open beta test, whatever that means, and Microsoft’s SQL Server 2005, nee Yukon, is in general release. Both have native XML capabilities surpassing Oracle’s – which is interesting in its own right, because it’s rare for either of those vendors to pull ahead of Oracle in an OLTP feature, and almost unprecedented for both to do so at once.

So let’s talk about native XML support, what it is, and who might or should care about it. (Well, the apps part is actually in a separate Part 2 post.) Most of this is based on research that’s several months old, but except for a scarcity of actual user interviews, that shouldn’t matter much.

There are two main non-native ways to put XML into a SQL database such as Oracle – shredding and LOBs (BLOBs or CLOBs – i.e., Binary or Character Large OBjects). Both can perform poorly, for different reasons. Shredding takes XML documents and distributes them among a bunch of tables. So one update in XML can become many updates when shredded, and one lookup in XML can become a complex join from shredded storage. LOB storage obviates those problems, but creates another – even when you’re only looking for part of a document, you have to retrieve and handle the whole thing, and the same goes for updates.

So native storage can be a good thing when you can afford neither the performance hit of shredding, nor of LOB storage, nor of any available hybrid. It also could be good if getting good performance from non-native storage, while possible, would create undue burdens on application development, or if there’s some other reason one or both of the shredding and LOB approaches isn’t viable.

One nice feature is that native-XML storage has almost no downside, at least if you get it from the high-end DBMS vendors. IBM, Oracle, and Microsoft have all worked out ways to have integrated query parsing and query optimization, while letting storage be more or less separate. More precisely, Oracle actually still sticks everything into one data store (hence the lack of native XML support), but allows near-infinite flexibility in how it is accessed. Microsoft has already had separate servers for tabular data, text, and MOLAP, although like Sybase, it doesn’t have general datatype extensibility that it can expose to customers, or exploit itself to provide a great variety of datatypes. IBM has had Oracle-like extensibility all along, although it hasn’t been quite as aggressive at exploiting it; now it’s introduced a separate-server option for XML. Both Microsoft and IBM claim that their administrative tools are slick enough that the DBA has little extra work from their offerings than would be present in a true single-server solution.

So how does the storage actually work? The basic idea is exactly what you’d think. Data is stored in name-value pairs, with pointers connecting parents to children. The secret sauce (and here I have less detail than I’d like) is the extra information that’s stored, either at the nodes directly, or in an overarching index. Obviously, there’s a tradeoff between update and retrieval speed. And equally obviously, I need to learn more of the particulars.

And on that somewhat lame note, let me point you at Part 2 of this post, which discusses whether and how this stuff will actually be used. (Preview: It will, big time – I think.)

August 13, 2005

The end of the single-server DBMS vendor

For all practical purposes, there are no DBMS vendors left advocating single-server strategies. Oracle was the last one, but it just acquired in-memory data management vendor TimesTen, which will be used as a cache in front of high-performance Oracle databases. (It will also continue to be sold for stand-alone uses, especially in the financial trading and defense/intelligence markets.)

IBM’s Viper is a server-and-a-half story, with lots of integration over a dual-server (one relational, one native XML) base. IBM also is moving aggressively in data integration/federation, with Ascential and many other acquisitions. It also sells a broad range of database products itself, including two DB2s, several Informix products, and so on.

Microsoft also has a multi-server strategy. In its case, relational, text, and MOLAP storage are more separate than in Oracle’s or even IBM’s products; again, there’s a thick layer of technology on top integrating them. An eventual move to native XML storage will, one must imagine, be handled in the same way.

Smaller vendors Sybase and Progress also offer multiple DBMS each.

Teradata is a pretty big player with only one DBMS — but it’s specialized for data warehousing. Teradata is the first to tell you you should use something else for your classical transaction processing.

The Grand Unified Integrated Database theory is, so far as I can tell, quite dead. Some people just refuse to admit that fact.

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.