Introduction to Platfora
When I wrote last week that I have at least 5 clients claiming they’re uniquely positioned to support BI over Hadoop (most of whom partner with a 6th client, Tableau) the non-partnering exception I had in mind was Platfora, Ben Werther’s oh-so-stealthy startup that is finally de-stealthing today. Platfora combines:
- An interesting approach to analytic data management.
- Business intelligence tools integrated with that.
The whole thing sounds like a perhaps more general and certainly non-SaaS version of what Metamarkets has been offering for a while.
The Platfora technical story starts:
- You have data in Hadoop.
- You extract data into a memory-centric data store.
- You can do drilldowns back into Hadoop.
- The BI part tries to imitate Tableau/QlikView kinds of functionality, but through a browser, thanks to HTML5.*
*More precisely, via HTML5 canvas.
Platfora’s core concept is probably the “lens”, which is a snowflake-schema mini data mart materialized onto Platfora’s servers via a Hadoop job. A lens is meant to be used in-memory but can certainly be large enough to spill onto disk, which is why I call Platfora’s data store “memory-centric ” rather than “in-memory”. A lens is a lot like a materialized view, including in that it’s incrementally maintained.
Time didn’t permit drilling down into detail about how Platfora’s data store works, but it is certainly compressed and columnar.* Compression is dictionary-style, with each table having its own dictionary. It is not unreasonable to think of Platfora’s data store as a tiered cache over Hadoop. Similarly, I also didn’t drill down — as it were — into how a Platfora lens is incrementally maintained. Interestingly, Platfora doesn’t seem to have QlikView’s strict requirement that:
there should be no more … than one possible join path between any pair of tables.
I’d guess there’s some specific bit of QlikView interface functionality that depends on that assumption, but which isn’t present in Platfora.
*Ironically, when Ben was at Greenplum, he tried to convince me that block-level compression was just as good as columnar.
For collaboration and so on, you can take a snapshot of a lens that captures its data, metadata, and actual UI image, the point of metadata capture being that the lens can later be refreshed. Platfora doesn’t directly support rebuilding a lens as of a past point-in-time, but in principle you can do that, so if there’s actually demand for that feature they surely could add it in.
Comments
6 Responses to “Introduction to Platfora”
Leave a Reply
Seems awkward. Hadoop really is poor at fast interaction (figure say 30 seconds overhead for map+shuffle – before any work if there is load.) This leads to either moving to a dedicated Hadoop for this purpose (including data!) or suffering with slow interaction.
There is still space for Hadoop analytics – why not recast as a analytic tool loosely coupled to Hadoop or create a in hadoop in memory store (just steal resources and bypass the scheduler) for after data is extracted from HDFS?
It’s possible to have join paths between two tables based on multiple fields – QlikView just generates a synthetic key.
But yes, QlikView does NOT support role playing keys (like SQL). The reason for this goes back to the “associative” model whereby users can easily distinguish between possible values and excluded values. This is IMO an incredibly powerful feature since you’ve always got context.
If you had role playing keys, and hence multiple join paths between tables, this would lead to ambiguities which would prevent users from seeing this context at all times.
You really need to use QlikView to properly grasp the significance of this.
But I should point out that MS Excel PowerPivot has essentially copied this feature (MS refers to QlikView’s list boxes as “slicers”, but they both exhibit this associative behaviour. As a result PowerPivot also has the same join path limitation you speak of.
[…] via Platfora shows a whole new way to do business intelligence on big data. More coverage by TechCrunch and Mr. Monash. […]
[…] 7. In a comment on my Platfora post, Neil Hepburn made a good point about associative UIs and acyclic join paths. […]
[…] BI. QlikView, SAP HANA, Oracle Exalytics, and Platfora are just four examples of many. But few enterprises will want to confine their analytics to such […]
[…] latter point is that new in-memory analytic data stores tend to be columnar — think HANA or Platfora; compression is commonly cited as a big reason for the choice. (Another reason is that I/O […]