Platfora at the time of first GA
Well-resourced Silicon Valley start-ups typically announce their existence multiple times. Company formation, angel funding, Series A funding, Series B funding, company launch, product beta, and product general availability may not be 7 different “news events”, but they’re apt to be at least 3-4. Platfora, no exception to this rule, is hitting general availability today, and in connection with that I learned a bit more about what they are up to.
In simplest terms, Platfora offers exploratory business intelligence against Hadoop-based data. As per last weekend’s post about exploratory BI, a key requirement is speed; and so far as I can tell, any technological innovation Platfora offers relates to the need for speed. Specifically, I drilled into Platfora’s performance architecture on the query processing side (and associated data movement); Platfora also brags of rendering 100s of 1000s of “marks” quickly in HTML5 visualizations, but I haven’t a clue as to whether that’s much of an accomplishment in itself.
Platfora’s marketing suggests it obviates the need for a data warehouse at all; for most enterprises, of course, that is a great exaggeration. But another dubious aspect of Platfora marketing actually serves to understate the product’s merits — Platfora claims to have an “in-memory” product, when what’s really the case is that Platfora’s memory-centric technology uses both RAM and disk to manage larger data marts than could reasonably be fit into RAM alone. Expanding on what I wrote about Platfora when it de-stealthed:
- Platfora incrementally batch-loads data from Hadoop into its own bare-bones SQL data store, and does BI against that. That data store:
- Of course wants to run in-memory whenever possible …
- … but also has a significant disk-based aspect.
- Is true-columnar on disk and in memory alike.
- Stores all columns from a given row on the same nodes.
- Specifically, Platfora builds star-schema data marts, called “lenses”. To avoid data bloat on the Platfora servers:
- Two lenses with the same data often only store it once.
- The data for a given lens can be “evicted” if it won’t be needed for a while. (But the specifications for the lens are of course kept in case you want to rebuild it later.)
Notes on Platfora’s Hadoop ETL (Extract/Transform/Load) include:
- The basic idea is that you periodically re-run a job to pick up incremental changes since the last load.
- Right now that’s just a cron job or something. Platfora plans to add scheduling features imminently.*
- Platfora is sensitive to Hive partitioning.
- Platfora can run filters and so on to extract non-Hive data (the more common case).
*But in a sad comment on Hadoop’s workload management capabilities, Platfora doesn’t expect these features to be much used, at least at first.
Platfora’s aggregation story goes something like this:
- If an aggregate can be updated incrementally — for example a count or sum — Platfora probably will maintain it for you and update it on load.
- Ditto if it can be maintained almost incrementally — for example an average.
- Platfora also does Distinct calculations, even though those have to be worked through on its own servers.
As you would expect, Version 1 of the Platfora data store has various limitations, such as:
- Platfora Version 1 can’t do much with arrays or (other) nested data structures — it just transforms them into JSON strings.
- Platfora’s SQL support is limited.
- The Platfora data store has a “fat head” master (but at least that head is multi-node).
Naturally, Platfora hopes to fix these issues down the road.
Finally, a few company notes:
- Platfora has had 20 beta users, mainly but not entirely among online businesses.
- Platfora has close to 50 people.
- Platfora is currently focused on US direct sales, relying on inbound leads.
Comments
13 Responses to “Platfora at the time of first GA”
Leave a Reply
I see a lot of activity from start-up BI and analytics companies like Platfora, Karmasphere, Datameer, etc. But what are the more established “next-gen” BI providers such as Tableau and QlikView doing to address new Hadoop-based use cases?
There may be some assumptions in your question that I don’t share. But anyhow:
In cases where BI tool/Hive combos don’t get the job done, you can combine a BI tool, an analytic RDBMS, Hadoop/Hive, and suitable ETL (e.g. via the DBMS’ Hadoop connector). See, for example, my posts about Teradata SQL-H (which now is not restricted to Aster).
So how is this different from the ROLAP vs. MOLAP wars of the mid 1990’s? Slow but flexible RDBMS queries against a DWH equate to today’s slow but infinitely flexible map-reduce jobs against hdfs. New in-memory(sorta) caches like Platfora’s equate to the old MOLAP cubes.
Haven’t we seen this movie before?
Justin,
You can use Platfora’s UI to populate it. At a guess, I’d say that was true of client-side MOLAP (e.g. Cognos), but not server-side (e.g. Essbase). Platfora’s incremental refresh sounds pretty smooth as well.
Thanks Curt. Curious, what assumptions do you not share?
The question seemed to draw distinctions that I don’t recognize, as if certain BI tools could extract from Hadoop into a table and others could not.
Let’s not push this too far. I’m just trying to guess at what was meant.
Sounds like Platfora is building what Metamarkets already run on, a cube based in memory structure with visualization. Not sure how it makes hadoop more efficient for large data sets. It is going back to 1990s…..
The general idea of Platfora is indeed a lot like Metamarkets. But you control it (vs. Metamarkets’ SaaS), and the style of BI is different — Tableau-lite for Platfora, va. more traditional in the case of Metamarkets.
[…] feature proprietary data stores, often memory-centric. Two examples I’ve written about are Platfora and QlikView, but “in-memory BI” goes far beyond those two […]
[…] Usually that’s in the context of analytic DBMS, but it also arises in analytic stacks such as Platfora, Metamarkets or even QlikView, and also in the challenges of making predictive modeling […]
[…] of my clients are focused on such scenarios, including WibiData, Teradata Aster (e.g. via nPath), Platfora (in the imminent Platfora 3), and others. And so I get involved in naming exercises. The term […]
[…] Platfora. […]
[…] and data management. That’s the area I most commonly write about, for example in the cases of Platfora, QlikView, or Metamarkets. It goes back to the 1990s — notably the Business Objects semantic […]