November 15, 2008

High-performance analytics

For the past few months, I’ve collected a lot of data points to the effect that high-performance analytics – i.e., beyond straightforward query — is becoming increasingly important. And I’ve written about some of them at length. For example:

MapReduce – controversial or in some cases even disappointing though it may be – has a lot of use cases.
It’s early days, but Netezza and Teradata (and others) are beefing up their geospatial analytic capabilities.
Memory-centric analytics is in the spotlight.

Ack. I can’t decide whether “analytics” should be a singular or plural noun. Thoughts?

Another area that’s come up which I haven‘t blogged about so much is data mining in the database. Data mining accounts for a large part of data warehouse use. The traditional way to do data mining is to extract data from the database and dump it into SAS. But there are problems with this scenario, including: Read more

Categories: Aster Data, Data warehousing, EAI, EII, ETL, ELT, ETLT, Greenplum, MapReduce, Netezza, Oracle, Parallelization, SAS Institute, Teradata

6 Comments

October 17, 2008

Introduction to Talend

I didn’t spend much time on the show floor at Teradata Partners, but I did connect with Yves de Montcheuil of Talend for a couple of little chats. Highlights of the Talend story include: Read more

Categories: Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Talend

5 Comments

October 10, 2008

Multitenancy hype is getting out of control

I posted recently on SaaS-data-integration-in-the-cloud, and a couple of vendors stopped by the comment thread to shared what they do. One was Boomi, which has a blog that does a good job of spelling out its opinions. What the Boomi blog is not so good at, however, is giving any good reasons why one should share those opinions.

I refer specifically to a couple of posts claiming that multitenancy is somehow crucial for SaaS data integration to work. To this I can only say — huh? A decent data integration system should be able to handle many parallel threads at once, connecting many pairs of databases at once. So the hard part of multitenancy is pretty much “free.” If, even so, the integration provider chooses not to go fully multitenant, whose business is it but theirs? Read more

Categories: Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Software as a Service (SaaS)

7 Comments

October 9, 2008

Everybody’s putting integration services in the cloud

Both Pervasive Software and Cast Iron Systems told me recently of fairly pure cloud offerings. In this, they’re joining Informatica, which started offering Salesforce.com integration-as-a-service back in 2006. So far as I can tell, the three vendors are doing somewhat different things. Read more

Categories: Cast Iron Systems, Cloud computing, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Informatica, Pervasive Software, Software as a Service (SaaS)

8 Comments

October 5, 2008

Schema flexibility and XML data management

Conor O’Mahony, marketing manager for IBM’s DB2 pureXML, talks a lot about one of my favorite hobbyhorses — schema flexibility — as a reason to use an XML data model. In a number of industries he sees use cases based around ongoing change in the information being managed:

Tax authorities change their rules and forms every year, but don’t want to do total rewrites of their electronic submission and processing software.
The financial services industry keeps inventing new products, which don’t just have different terms and conditions, but may also have different kinds of terms and conditions.
The same, to some extent, goes for the travel industry, which also keeps adding different kinds of offers and destinations.
The energy industry keeps adding new kinds of highly complex equipment it has to manage.

Conor also thinks market evidence shows that XML’s schema flexibility is important for data interchange. Read more

Categories: Data models and architecture, EAI, EII, ETL, ELT, ETLT, IBM and DB2, pureXML, Structured documents

3 Comments

October 5, 2008

Vertical market XML standards

Tracking the alphabet soup of vertical market XML standards is hard. So as a starting point, I’m splitting a list I got from IBM into a standalone post.

Among the most important or successful IBM pureXML–supported standards, in terms of downloads and other evidence of customer interest, are: Read more

Categories: Application areas, EAI, EII, ETL, ELT, ETLT, IBM and DB2, pureXML, Structured documents

2 Comments

October 5, 2008

Overview of IBM DB2 pureXML

On August 29, I had a great call with IBM about DB2 pureXML (most of the IBM side of the talking was done by Conor O’Mahony and Qi Jin). I’m finally getting around to writing it up now. (The world of tabular data warehousing has kept me just a wee bit busy …)

As I write it, I see there are a considerable number of holes, but that’s the way it seems to go when researching XML storage. I’m also writing up a September call from which I finally figured out (I think) the essence of how MarkLogic Server works – but only after five months of trying. It turns out that MarkLogic works rather differently from DB2 pureXML. Not coincidentally, IBM and Mark Logic focus on rather different use cases for native XML storage.

What I understand so far about the basic DB2 pureXML architecture goes like this: Read more

Categories: EAI, EII, ETL, ELT, ETLT, IBM and DB2, pureXML, Structured documents

7 Comments

August 26, 2008

Three approaches to parallelizing data transformation

Many MPP data warehousing vendors have told me their products are used for ELT (Extract/Load/Transform) instead of ETL (Extract/Transform/Load). I.e., needed data transformations are done on the MPP system, rather than on the — probably SMP — system the data comes from.* If the data transformation is being applied on a record-by-record basis, then it’s automatically fully parallelized. Even if the transforms are more complex, considerable parallel processing may still be going on.

*Or it’s some of each, at which point it’s called ETLT — I bet you can work out what that stands for.

Categories: Aster Data, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, MapReduce, Parallelization, Pervasive Software

8 Comments

July 8, 2008

Google has thousands of internal data formats, mostly simple ones

In connection with the release of Protocol Buffers, Kenton Varda of Google wrote: Read more

Categories: Data integration and middleware, Google

2 Comments

March 26, 2008

Pervasive is also pursuing simplicity and SaaS integration

I blogged recently about Cast Iron Systems, a simplicity-oriented data integration appliance vendor that is increasingly focusing on the SaaS market. Well, Pervasive Software is doing something similar.

Via Data Integrator, Pervasive is a leader in the low-cost integration market, with revenue split about 50/25/25 between direct sales, ISVs, and SaaS. Pervasive fondly believes that its products cost half as much as Cast Iron’s, and wind up taking no more installation effort when you factor in Pervasive’s broader capabilities in areas such as workflow. However, there’s some doubt as to whether this is apples-to-apples. Cast Iron does include hardware, after all, and as Pervasive itself points out, Cast Iron will bundle some professional services into a sale if you ask nicely.

Two things are new. Read more

Categories: Cloud computing, EAI, EII, ETL, ELT, ETLT, Pervasive Software, Software as a Service (SaaS)

5 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

High-performance analytics

Introduction to Talend

Multitenancy hype is getting out of control

Everybody’s putting integration services in the cloud

Schema flexibility and XML data management

Vertical market XML standards

Overview of IBM DB2 pureXML

Three approaches to parallelizing data transformation

Google has thousands of internal data formats, mostly simple ones

Pervasive is also pursuing simplicity and SaaS integration

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin