Third-party analytics
This is one of a series of posts on business intelligence and related analytic technology subjects, keying off the 2011/2012 version of the Gartner Magic Quadrant for Business Intelligence Platforms. The four posts in the series cover:
- Overview comments about the 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms, as well as a link to the actual document.
- Business intelligence industry trends — some of Gartner’s thoughts but mainly my own.
- Company-by-company comments based on the 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms.
- (This post) Third-party analytics, pulling together and expanding on some points I made in the first three posts.
I’ve written a lot this weekend about various areas of business intelligence and related analytics. A recurring theme has been what we might call third-party analytics — i.e., anything other than buying analytic technology and deploying it in your own enterprise. Four main areas include:
- Business intelligence software OEMed to packaged operational application vendors.
- Business intelligence software OEMed to SaaS (Software as a Service) application vendors.
- Business intelligence software bundled into information-selling businesses.
- Stakeholder-facing analytics, which usually is just BI allowing customers (or suppliers, investors, citizens, etc.) to look into one of your databases.
Categories: Business intelligence, Business Objects, Information Builders, Intersystems and Cache', Jaspersoft, Pentaho, Software as a Service (SaaS) | 1 Comment |
The 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms — company-by-company comments
This is one of a series of posts on business intelligence and related analytic technology subjects, keying off the 2011/2012 version of the Gartner Magic Quadrant for Business Intelligence Platforms. The four posts in the series cover:
- Overview comments about the 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms, as well as a link to the actual document.
- Business intelligence industry trends — some of Gartner’s thoughts but mainly my own.
- (This post) Company-by-company comments based on the 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms.
- Third-party analytics, pulling together and expanding on some points I made in the first three posts.
The heart of Gartner Group’s 2011/2012 Magic Quadrant for Business Intelligence Platforms was the company comments. I shall expound upon some, roughly in declining order of Gartner’s “Completeness of Vision” scores, dubious though those rankings may be. Read more
Comments on the 2012 Forrester Wave: Enterprise Hadoop Solutions
Forrester has released its Q1 2012 Forrester Wave: Enterprise Hadoop Solutions. (Googling turns up a direct link, but in case that doesn’t prove stable, here also is a registration-required link from IBM’s Conor O’Mahony.) My comments include:
- The Forrester Wave’s relative vendor rankings are meaningless, in that the document compares apples, peaches, almonds, and peanuts. Apparently, it covers any vendor that includes a distribution of Apache Hadoop MapReduce into something it offers, and that offered at least two (not necessarily full production) references for same.
- The Forrester Wave for “enterprise Hadoop” contradicts itself on the subject of Hortonworks.
- The Forrester Wave for “enterprise Hadoop” is correct when it says “Hortonworks … has Hadoop training and professional services offerings that are still embryonic.”
- Peculiarly, the Forrester Wave for “enterprise Hadoop” also says “Hortonworks offers an impressive Hadoop professional services portfolio”. Hortonworks will likely win one or more nice partnership deals with vendors in adjacent fields, but even so its professional services capabilities are … well, a good word might be “embryonic”.
- Forrester Waves always seem to have weird implicit definitions of “data warehousing”. This one is no exception.
- Forrester gave top marks in “Functionality” to 11 of 13 “enterprise Hadoop” vendors. This seems odd.
- I don’t know why MapR, which doesn’t like HDFS (Hadoop Distributed File System), got top marks in “Subproject integration”.
- Forrester gave top marks in “Storage” to Datameer. It also gave higher marks to MapR than to EMC Greenplum, even though EMC Greenplum’s technology is a superset of MapR’s. Very strange. (Edit: Actually, as per a comment below, there is some uncertainty about the EMC/MapR relationship.)
- Forrester gave higher marks in “Acceleration and optimization” to Hortonworks than to Cloudera and IBM, and higher marks yet to Pentaho. Very odd.
- I’m not sure what Forrester is calling a “Distributed EDW file store connector”, but it sounds like something that Cloudera has provided via partnership to a number of analytic DBMS vendors.
- Forrester’s “Strategy” rankings seem to correlate to a metric of “We’re a large enough vendor to go in N directions at once”, for various values of N.
- Forrester is correct to rank Cloudera’s “Adoption” as being stronger than EMC/Greenplum’s or MapR’s. But Hortonworks’ strong mark for “Adoption” baffles me.
Categories: Cloudera, Data warehousing, EMC, Greenplum, Hadoop, Hortonworks, MapR, MapReduce, Pentaho | 11 Comments |
Data integration vendors and Hadoop
There have been many recent announcements about how data integration/ETL (Extract/Transform/Load) vendors are going to work with MapReduce. Most of what they say boils down to one or more of a few things:
- Hadoop generally stores data in HDFS (Hadoop Distributed File System). ETL vendors want to be able to extract data from or load it into HDFS.
- ETL vendors have development environments that let you specify/script/whatever ETL jobs. ETL vendors want their development tools to develop ETL processes executed via MapReduce/Hadoop.
- In particular, this allows ETL vendors to exploit the parallel-processing capabilities of MapReduce.
Some additional twists include:
- Pentaho announced business intelligence and ETL for Hadoop last year.
- Syncsort thinks different sort algorithms should be usable with Hadoop. Consequently, it plans to contribute technology to the community to make sort pluggable into Hadoop. (However, Syncsort is keeping its own sort technology proprietary.)
- Syncsort is considering replicating some Hive functionality, starting with joins, hopefully running much faster. (However, Syncsort’s basic Hadoop support is a quarter or three away, so any more advanced functionality would probably come out in 2012 or beyond.)
- SnapLogic fondly thinks that its generation of MapReduce jobs is particularly intelligent.
Finally, my former clients at Pervasive, who haven’t briefed me for a while, seem to have told Doug Henschen that they have pointed DataRush at MapReduce.* However, I couldn’t find evidence of same on the Pervasive DataRush website beyond some help in using all the cores on any one Hadoop node.
*Also see that article because it names a bunch of ETL vendors doing Hadoop-related things.
Categories: Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Hadoop, MapReduce, Parallelization, Pentaho, Pervasive Software, SnapLogic, Syncsort | 1 Comment |
The substance of Pentaho’s Hadoop strategy
Pentaho has been talking about a Hadoop-related strategy. Unfortunately, in support of its Hadoop efforts, Pentaho has been — quite insistently — saying things that don’t make a lot of sense to people who know anything about Hadoop.
That said, I think I found four sensible points in Pentaho’s Hadoop strategy, namely:
- If you use an ETL tool like Pentaho’s to move things in and out of HDFS, you may be able to orchestrate two more steps in the ETL process than if you used Hadoop’s native orchestration tools.
- A lot of what you want to do in MapReduce is things that can be graphically specified in an ETL tool like Pentaho’s. (That would include tokenization or regex.)
- If you have some really lightweight BI requirements (ad hoc, reporting, or whatever) against HDFS data, you might be content to do it straight against HDFS, rather than moving the data into a real DBMS. If so, BI tools like Pentaho’s might be useful.
- Somebody might want to use a screwy version of MapReduce, where by “screwy” I mean anything that isn’t Cloudera Enterprise, Aster Data SQL/MapReduce, or some other implementation/distribution with a lot of supporting tools. In that case, they might need all the tools they can get.
The first of those points is, in the grand scheme of things, pretty trivial.
The third one makes sense. While Hadoop’s Hive client means you could roll your own integration with your own favorite BI tool in any case, having somebody certify it for you themselves could be nice. So if Pentaho ships something that works before other vendors do, good on them. (Target date seems to be October.)
The fourth one is kind of sad.
But if there’s any shovel-meet-pony aspect to all this — or indeed a reason for writing this blog post — it would be the second point. If one understands data management, but is in the “Oh no! Hadoop wants me to PROGRAM!” crowd, then being able to specify one’s MapReduce might be a really nice alternative versus having to actually code it.
Categories: Analytic technologies, Business intelligence, Hadoop, MapReduce, Parallelization, Pentaho | 10 Comments |
Intelligent Enterprise’s Editors’/Editor’s Choice list for 2010
As he has before, Intelligent Enterprise Editor Doug Henschen
- Personally selected annual lists of 12 “Most influential” companies and 36 “Companies to watch” in analytics- and database-related sectors.
- Made it clear that these are his personal selections.
- Nonetheless has called it an Editors’ Choice list, rather than Editor’s Choice. 🙂
(Actually, he’s really called it an “award.”)
Introduction to Pentaho
I finally caught up with Pentaho, which along with Jaspersoft is one of the two most visible open source business intelligence companies, Actuate perhaps excepted. Highlights included:
- Much like Jaspersoft, Pentaho’s initial focus was mainly on embedded, operational BI.
- However, Pentaho now feels it has a decent end-user GUI as well, and traditional-BI is a bigger part of sales.
- Also, some sales are focused on data integration, perhaps in support of more traditional BI products. Pentaho has even had an Ab Initio replacement in data integration. (Can there be any change more extreme than going from Ab Initio to open source?)
- As an example of technical breadth, Pentaho says that its Mondrian OLAP engine is used by Jaspersoft.
- Pentaho has Excel output, but not in the form of live formulas.
- Pentaho does XQuery.
- Industries with more Pentaho adoption than average include:
- Financial services (traditionally open-source-friendly, according to Pentaho)
- Government (ditto)
- Web 2.0 (obviously ditto)
- Travel/transportation (cash-strapped)
- Frontier Airlines is a Pentaho/Greenplum customer.
- TradeDoubler is a Pentaho/InfoBright customer. (Pentaho thinks that TradeDoubler reloads its warehouse every day, which if true frankly casts some doubt on InfoBright’s architecture.)
- Data mining is something of a Pentaho sideline. There’s some university in New Zealand that built data mining capabilities in Pentaho, and some data mining research is done in that. Separately, Pentaho has been integrated with R.
- Community contributions are concentrated in the areas you’d expect — features some user or system integrator needs for a specific project, connectors, bug reports, and the like.