IBM and DB2
Analysis of IBM and various of its product lines in database management, analytics, and data integration.
- Cognos
- solidDB
- (in The Monash Report) Operational and strategic issues for IBM
- (in Text Technologies) IBM in the text analytics market
- (in Software Memories) Historical notes on IBM
- (in Software Memories) Historical notes on Informix
Big scientific databases need to be stored somehow
A year ago, Mike Stonebraker observed that conventional DBMS don’t necessarily do a great job on scientific data, and further pointed out that different kinds of science might call for different data access methods. Even so, some of the largest databases around are scientific ones, and they have to be managed somehow. For example:
- Microsoft just put out an overwrought press release. The substance seems to be that Pan-STARRS — a Jim Gray legacy also discussed in an August, 2008 Computerworld article — is adding 1.4 terabytes of image data per night, and one not so new database adds 15 terabytes per year of some kind of computer simulation output used to analyze protein folding. Both run on SQL Server, of course.
- Kognitio has an astronomical database too, at Cambridge University, adding 1/2 a terabyte of data per night.
- Oracle is used for a McGill University proteonomics database called CellMapBase. A figure of 50 terabytes of “mass storage” is included, which doesn’t include tape backup and so on.
- The Large Hadron Collider, once it actually starts functioning, is projected to generate 15 petabytes of data annually, which will be initially stored on tape and then distributed to various computing centers around the world.
- Netezza is proud of its ability to serve images and the like quickly, although off the top of my head I’m not thinking of a major customer it has in that area. (But then, if you just sell software, your academic discount can approach 100%; but if like Netezza you have an actual cost of goods sold, that’s not as appealing an option.)
Long-term, I imagine that the most suitable DBMS for these purposes will be MPP systems with strong datatype extensibility — e.g., DB2, PostgreSQL-based Greenplum, PostgreSQL-based Aster nCluster, or maybe Oracle.
Categories: Aster Data, Data types, Greenplum, IBM and DB2, Kognitio, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, PostgreSQL, Scientific research | 1 Comment |
Schema flexibility and XML data management
Conor O’Mahony, marketing manager for IBM’s DB2 pureXML, talks a lot about one of my favorite hobbyhorses — schema flexibility — as a reason to use an XML data model. In a number of industries he sees use cases based around ongoing change in the information being managed:
- Tax authorities change their rules and forms every year, but don’t want to do total rewrites of their electronic submission and processing software.
- The financial services industry keeps inventing new products, which don’t just have different terms and conditions, but may also have different kinds of terms and conditions.
- The same, to some extent, goes for the travel industry, which also keeps adding different kinds of offers and destinations.
- The energy industry keeps adding new kinds of highly complex equipment it has to manage.
Conor also thinks market evidence shows that XML’s schema flexibility is important for data interchange. Read more
Categories: Data models and architecture, EAI, EII, ETL, ELT, ETLT, IBM and DB2, pureXML, Structured documents | 3 Comments |
Vertical market XML standards
Tracking the alphabet soup of vertical market XML standards is hard. So as a starting point, I’m splitting a list I got from IBM into a standalone post.
Among the most important or successful IBM pureXML–supported standards, in terms of downloads and other evidence of customer interest, are: Read more
Categories: Application areas, EAI, EII, ETL, ELT, ETLT, IBM and DB2, pureXML, Structured documents | 2 Comments |
Overview of IBM DB2 pureXML
On August 29, I had a great call with IBM about DB2 pureXML (most of the IBM side of the talking was done by Conor O’Mahony and Qi Jin). I’m finally getting around to writing it up now. (The world of tabular data warehousing has kept me just a wee bit busy …)
As I write it, I see there are a considerable number of holes, but that’s the way it seems to go when researching XML storage. I’m also writing up a September call from which I finally figured out (I think) the essence of how MarkLogic Server works – but only after five months of trying. It turns out that MarkLogic works rather differently from DB2 pureXML. Not coincidentally, IBM and Mark Logic focus on rather different use cases for native XML storage.
What I understand so far about the basic DB2 pureXML architecture goes like this: Read more
Categories: EAI, EII, ETL, ELT, ETLT, IBM and DB2, pureXML, Structured documents | 7 Comments |
Top DBMS on Linux
I was looking up George Crump’s blogs in connection with his recent post on SSDs, and I stumbled upon one that outlines at great length what features Linux backup systems should have. I won’t claim to have read it word for word, but what did catch my eye were a couple of comments on DBMS market share, which boiled down to:
- Oracle
- MySQL
- PostgreSQL
Categories: IBM and DB2, Market share and customer counts, MySQL, Oracle, PostgreSQL | Leave a Comment |
More data on data warehouse sizes and issues
I spoke today with Paul Barth and Randy Bean of consultancy NewVantage Partners. The core of NewVantage’s business seems to be helping large enterprises (especially financial services) with their data warehouse strategies. Takeaways — none of which should shock regular readers of DBMS2 — included:
- Administrative cost and difficulty are often the single biggest issue in selecting analytic DBMS products.
- Oracle hits a wall around 10 terabytes of user data. The one customer NewVantage can think of with an Oracle data warehouse over 10 terabytes is fleeing Oracle for Netezza.
- NewVantage says that very specialized data warehouses on Oracle could conceivably be larger than that.
- NewVantage does have a customer on DB2/UDB in the 30-40 terabyte range. That customer does a lot of careful tuning to make it work.
- About 15% of NewVantage’s customers use Netezza. Few if any use newer analytic DBMS (but I got the sense more will soon). The rest rely on “traditional” DBMS, a group that includes Teradata.
Categories: Data warehousing, IBM and DB2, Netezza, Oracle | 1 Comment |
Microsoft is buying DATAllegro
I’ve long argued that:
- Oracle and Microsoft are doomed in the data warehouse market unless they acquire MPP/shared-nothing data warehouse DBMS and/or data warehouse appliances.
- DATAllegro is the ideal acquisition for either of them.
Microsoft has now validated my claim by agreeing to buy DATAllegro. As you probably know, we’ve been covering DATAllegro extensively, as per the links listed below.
Basic deal highlights include: Read more
Who is doing what in XML data management these days?
A comment thread to a post on a different subject has opened up a discussion of XML storage. Frankly, I haven’t kept up with my briefings on the subject, in part because XML support hasn’t proved to be very important yet to the big DBMS vendors, somewhat to my surprise. When last I looked, the situation wasn’t much different from what it was back in November, 2005. Unless I’ve missed something (and please tell me if I have!), here’s what’s going on: Read more
Categories: IBM and DB2, Intersystems and Cache', MarkLogic, Microsoft and SQL*Server, Oracle, Structured documents | 7 Comments |
ANTs bails out of the DBMS market
ANTs Data Server — i.e., the ANTs DBMS — has been sold off to a company called 4Js. It is now to be called Genero DB. Actually, 4Js has been selling or working on a version of the product called Genero DB since 2006, specifically an Informix-compatible one.
I’m not totally clear on why an Informix-compatible DBMS is needed in a world that already has Informix SE, but maybe IBM is overcharging for maintenance even on the low-end version of the product.
Meanwhile, ANTs, which had originally tried to get enterprises to migrate away from Oracle, is now focused on middleware called the ANTs Compatibility Server to help them migrate to Oracle, specifically/initially from Sybase.
Categories: ANTs Software, Emulation, transparency, portability, IBM and DB2, Oracle, Sybase | 2 Comments |
Truviso and EnterpriseDB blend event processing with ordinary database management
Truviso and EnterpriseDB announced today that there’s a Truviso “blade” for Postgres Plus. By email, EnterpriseDB Bob Zurek endorsed my tentative summary of what this means technically, namely:
There’s data being managed transactionally by EnterpriseDB.
Truviso’s DML has all along included ways to talk to a persistent Postgres data store.
If, in addition, one wants to do stream processing things on the same data, that’s now possible, using Truviso’s usual DML.