IBM and DB2
Analysis of IBM and various of its product lines in database management, analytics, and data integration.
- Cognos
- solidDB
- (in The Monash Report) Operational and strategic issues for IBM
- (in Text Technologies) IBM in the text analytics market
- (in Software Memories) Historical notes on IBM
- (in Software Memories) Historical notes on Informix
Some big-vendor execution questions, and why they matter
When I drafted a list of key analytics-sector issues in honor of look-ahead season, the first item was “execution of various big vendors’ ambitious initiatives”. By “execute” I mean mainly:
- “Deliver products that really meet customers’ desires and needs.”
- “Successfully convince them that you’re doing so …”
- “… at an attractive overall cost.”
Vendors mentioned here are Oracle, SAP, HP, and IBM. Anybody smaller got left out due to the length of this post. Among the bigger omissions were:
- salesforce.com (multiple subjects).
- SAS HPA.
- The evolution of Hadoop.
Very brief CEP/streaming catchup
When I agreed to launch the StreamBase LiveView product via DBMS 2, I planned to catch up on the whole CEP/streaming area first. Due to the power and internet outages last week, that didn’t entirely happen. So I’ll do a bit of that now, albeit more cryptically than I hoped and intended.
- The upshot of my what to call CEP thread in August was that “streaming” and “event processing” are not the same concept, but it so happens that they have the most traction where they intersect. That said, I both observe and endorse an apparent shift from “event” to “stream” as the core of the terminology, in a reversal of my opinion of several years ago.
- IBM continues to throw a lot of resources at its System S/ InfoSphere Streams product, but I haven’t heard yet of much marketplace success. That said, I believe IBM is still pretty serious about Streams, as one would expect from an effort whose code name so cheekily references System R. In particular, Streams shows up prominently on IBM’s top-level analytic architecture slide.
- Sybase recently released its ESP (Event Stream Processor) 5.0, which it says is the full merger of the Aleri and Coral8 predecessors. You can still get Sybase ESP without buying into the full Sybase RAP stack, and Sybase has no plans to change that.
- Sybase has discontinued all the business intelligence types of products Aleri and Coral8 were developing. Rather, Sybase is OEMing Panopticon, which it reports has been well received. Other than the discontinuation of the BI efforts, there seem to be few Aleri or Coral8 features missing from the merged Sybase ESP product.
- Truviso continues to be out of the picture.
- I have more to say about StreamBase separately.
- I have more to say about Sybase and IBM, which I’ll get to when I can.
- I have nothing new on Progress Apama. I also know little about any of the open source efforts.
Meanwhile, if you want to see technically nitty-gritty posts about the CEP/streaming area, you may want to look at my CEP/streaming coverage circa 2007-9, based on conversations with (among others) Mike Stonebraker, John Bates, and Mark Tsimelzon.
Categories: Business intelligence, IBM and DB2, StreamBase, Streaming and complex event processing (CEP), Sybase, Truviso | 4 Comments |
Transparent relational OLTP scale-out
There’s a perception that, if you want (relatively) worry-free database scale-out, you need a non-relational/NoSQL strategy. That perception is false. In the analytic case it’s completely ridiculous, as has been demonstrated by Teradata, Vertica, Netezza, and various other MPP (Massively Parallel Processing) analytic DBMS vendors. And now it’s false for short-request/OLTP (OnLine Transaction Processing) use cases as well.
My favorite relational OLTP scale-out choice these days is the SchoonerSQL/dbShards partnership. Schooner Information Technology (SchoonerSQL) and Code Futures (dbShards) are young, small companies, but I’m not too concerned about that, because the APIs they want you to write to are just MySQL’s. The main scenarios in which I can see them failing are ones in which they are competitively leapfrogged, either by other small competitors – e.g. ScaleBase, Akiban, TokuDB, or ScaleDB — or by Oracle/MySQL itself. While that could suck for my clients Schooner and Code Futures, it would still provide users relying on MySQL scale-out with one or more good product alternatives.
Relying on non-MySQL NewSQL startups, by way of contrast, would leave me somewhat more concerned. (However, if their code is open sourced. you have at least some vendor-failure protection.) And big-vendor scale-out offerings, such as Oracle RAC or DB2 pureScale, may be more complex to deploy and administer than the MySQL and NewSQL alternatives.
Categories: Clustering, dbShards and CodeFutures, IBM and DB2, MySQL, NewSQL, NoSQL, OLTP, Open source, Oracle, Parallelization, Schooner Information Technology, Transparent sharding | 2 Comments |
IBM is buying parallelization expert Platform Computing
IBM is acquiring Platform Computing, a company with which I had one briefing, last August. Quick background includes: Read more
Categories: Hadoop, IBM and DB2, Investment research and trading, MapReduce, Parallelization, Scientific research | 5 Comments |
The Ted Codd guarantee
I write a lot about whether or not to use relational DBMS. For example:
- In May I surveyed relational vs. non-relational pros and cons at some length.
- Last November I mused about when it might be OK to do without joins.
- The question is implicit in a variety of posts about, say, document-oriented or object-oriented DBMS.
Before going further in that vein, I’d like to do a quick review of what E. F. “Ted” Codd was getting at with the relational model in the first place. Read more
Categories: Data models and architecture, IBM and DB2, MOLAP, NoSQL | 3 Comments |
Cloudera and Hortonworks
My clients at Cloudera have been around for a while, in effect positioned as “the Hadoop company.” Their business, in a nutshell, consists of:
- Packaging up a Cloudera distribution of Apache Hadoop. This distribution doesn’t have proprietary code; it’s just packaged by Cloudera from Apache projects (with a decent minority of the code happening to have been contributed by Cloudera engineers).
- Paid subscription support for Apache Hadoop and, in connection with that …
- … proprietary software that all support customers automatically get. There are two points to this proprietary software:
- It adds value for the customer.
- It makes Cloudera’s support job easier.
- Professional services around Hadoop.
- Training and conferences around Hadoop, which probably don’t generate all that much money, but are great marketing in terms of visibility, thought leadership, and lead generation.
Hortonworks spun out of Yahoo last week, with parts of the Cloudera business model, namely Hadoop support, training, and I guess conferences. Hortonworks emphatically rules out professional services, and says that it will contribute all code back to Apache Hadoop. Hortonworks does grudgingly admit that it might get into the proprietary software business at some point — but evidently hopes that day will never actually come.
Categories: Cloudera, Hadoop, Hortonworks, IBM and DB2, MapReduce, Open source, Yahoo | 9 Comments |
Eight kinds of analytic database (Part 1)
Analytic data management technology has blossomed, leading to many questions along the lines of “So which products should I use for which category of problem?” The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for “big data” is little help.
Let’s try eight categories instead. While no categorization is ever perfect, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need — and in most cases you’ll need several — is a great early step in your analytic technology planning. Read more
It’s official — the grand central EDW will never happen
I pointed out last year that the grand central enterprise data warehouse couldn’t happen; the post started:
An enterprise data warehouse should:
- Manage data to high standards of accuracy, consistency, cleanliness, clarity, and security.
- Manage all the data in your organization.
Pick ONE.
IBM’s main theme at the Enzee Universe conference has been to say the same thing.
Merv Adrian’s talk at the same conference made it clear that Gartner feels the same way, as does he personally. Indeed, like me, he’s racked up multiple decades of industry experience without ever finding a single theoretically ideal grand central EDW.
Forrester Research has been a little less clear on the point, but generally seems to be on the correct side of the issue as well.
If somebody is still saying that one central enterprise data warehouse can hold all the information or data you need on which to base your business decisions, they’re probably not somebody you should be listening to very hard.
Is that clear, or should I hammer home the point even harder? 😀
Categories: Data warehousing, IBM and DB2, Netezza | 8 Comments |
What to do about “unstructured data”
We hear much these days about unstructured or semi-structured (as opposed to) structured data. Those are misnomers, however, for at least two reasons. First, it’s not really the data that people think is un-, semi-, or fully structured; it’s databases.* Relational databases are highly structured, but the data within them is unstructured — just lists of numbers or character strings, whose only significance derives from the structure that the database imposes.
*Here I’m using the term “database” literally, rather than as a concise synonym for “database management system”. But see below.
Second, a more accurate distinction is not whether a database has one structure or none — it’s whether a database has one structure or many. The easiest way to see this is for databases that have clearly-defined schemas. A relational database has one schema (even if it is just the union of various unrelated sub-schemas); an XML database, however, can have as many schemas as it contains documents.
One small terminological problem is easily handled, namely that people don’t talk about true databases very often, at least when they’re discussing generalities; rather, they talk about data and DBMS.* So let’s talk of DBMS being “structured” singly or multiply or whatever, just as the databases they’re designed to manage are.
*And they refer to the DBMS as “databases,” because they don’t have much other use for the word.
All that said — I think that single vs. multiple database structures isn’t a bright-line binary distinction; rather, it’s a spectrum. For example: Read more
Categories: Cassandra, Couchbase, Data models and architecture, HBase, IBM and DB2, MarkLogic, MongoDB, NoSQL, Splunk, Theory and architecture | 19 Comments |
Alternatives for Hadoop/MapReduce data storage and management
There’s been a flurry of announcements recently in the Hadoop world. Much of it has been concentrated on Hadoop data storage and management. This is understandable, since HDFS (Hadoop Distributed File System) is quite a young (i.e. immature) system, with much strengthening and Bottleneck Whack-A-Mole remaining in its future.
Known HDFS and Hadoop data storage and management issues include but are not limited to:
- Hadoop is run by a master node, and specifically a namenode, that’s a single point of failure.
- HDFS compression could be better.
- HDFS likes to store three copies of everything, whereas many DBMS and file systems are satisfied with two.
- Hive (the canonical way to do SQL joins and so on in Hadoop) is slow.
Different entities have different ideas about how such deficiencies should be addressed. Read more
Categories: Aster Data, Cassandra, Cloudera, Data warehouse appliances, DataStax, EMC, Greenplum, Hadapt, Hadoop, IBM and DB2, MapReduce, MongoDB, Netezza, Parallelization | 22 Comments |