Interana
Interana has an interesting story, in technology and business model alike. For starters:
- Interana does ad-hoc event series analytics, which they call “interactive behavioral analytics solutions”.
- Interana has a full-stack analytic offering, include:
- Its own columnar DBMS …
- … which has a non-SQL DML (Data Manipulation Language) meant to handle event series a lot more fluently than SQL does, but which the user is never expected to learn because …
- … there also are BI-like visual analytics tools that support plenty of drilldown.
- Interana sells all this to “product” departments rather than marketing, because marketing doesn’t sufficiently value Interana’s ad-hoc query flexibility.
- Interana boasts >40 customers, with annual subscription fees ranging from high 5 figures to low 7 digits.
And to be clear — if we leave aside any questions of marketing-name sizzle, this really is business intelligence. The closest Interana comes to helping with predictive modeling is giving its ad-hoc users inspiration as to where they should focus their modeling attention.
Interana also has an interesting twist in its business model, which I hope can be used successfully by other enterprise software startups as well.
- For now, at no extra charge, Interana will operate its software for you as a managed service. (A majority of Interana’s clients run the software on Amazon or Azure, where that kind of offering makes sense.)
- However, presumably in connection with greater confidence in its software’s ease of administration, Interana will move this year toward unbundling the service as an extra-charge offering on top of the software itself.
The key to understanding Interana is its DML. Notes on that include:
- Interana’s DML is focused on path analytics …
- … but Interana doesn’t like to use that phrase because it sounds too math-y and difficult.
- Interana may be the first company that’s ever told me it’s focused on providing a better nPath. 🙂
- Primitives in Interana’s language — notwithstanding the company’s claim that it never ever intended to sell to marketing departments — include familiar web analytics concepts such as “session”, “funnel” and so on. (However, these are being renamed to more neutral terms such as “flow” in an upcoming version of the product.)
- As typical example questions or analytic subjects, Interana offered:
- “Which are the most common products in shopping carts where time-to-checkout was greater than 30 minutes?”
- Exactly which steps in the onboarding process result in the greatest user frustration?
- The Interana folks and I agree that Splunk is the most recent example of a new DML kicking off a significant company.
- The most recent example I can think of in which a vendor hung its hat on a new DML that was a “visual programming language” is StreamBase, with EventFlow. That didn’t go all that well.
- To use Founder/CTO Bobby Johnson’s summary term, the real goal of the Interana language is to describe a state machine, specifically one that produces (sets of) sequences of events (and the elapsed time between them).
Notes on Interana speeds & feeds include:
- Interana only promises data freshness up to micro-batch latencies — i.e., a few minutes. (Obviously, this shuts them out of most networking monitoring and devops use cases.)
- Interana thinks it’s very important for query response time to max out at a low number of seconds. If necessary, the software will return approximate results rather than exact ones so as to meet this standard.
- Interana installations and workloads to date have gotten as large as:
- 1-200 nodes.
- Trillions of rows, equating to 100s of TBs of data after compression/ >1 PB uncompressed.
- Billions of rows/events received per day.
- 100s of 1000s of (very sparse) columns.
- 1000s of named users.
Although Interana’s original design point was spinning disk, most customers store their Interana data on flash.
Interana architecture choices include:
- They’re serious about micro-batching.
- If the user’s data is naturally micro-batched — e.g. a new S3 bucket every few minutes — Interana works with that.
- Even if the customer’s data is streamed — e.g. via Kafka — Interana insists on micro-batching it.
- They’re casual about schemas.
- Interana assumes data arrives with some kind of recognizable structure, via JSON, CSV or whatever.
- Interana observes, correctly, that log data often is decently structured.
- For example, if you’re receiving “phone home” pings from products you originally manufactured, you know what data structures to expect.
- Interana calls this “logging with intent”.
- Interana is fine with a certain amount of JSON (for example) schema change over time.
- If your arriving data truly is a mess, then you need to calm it down via a pass through Splunk or whatever before sending it to Interana.
- Interana observes, correctly, that log data often is decently structured.
- JSON hierarchies turn into multi-part column names in the usual way.
- Interana supports one level of true nesting, and one level only; column values can be “lists”, but list values can’t be list themselves.
- Interana assumes data arrives with some kind of recognizable structure, via JSON, CSV or whatever.
Finally, other Interana tech notes include:
- Compression is a central design consideration …
- … especially but not only compression algorithms designed to deal with great sparseness, such as run-length encoding (RLE).
- Dictionary compression, in a strategy that is rarer than I once expected it to be, uses a global rather than shard-by-shard dictionary. The data Interana expects is of low-enough cardinality for this to be the better choice.
- Column data is sorted. A big part of the reason is of course to aid compression.
- Compression strategies are chosen automatically for each segment. Wholly automatically, I gather; you can’t tune the choice manually.
- As you would think, Interana technically includes multiple data stores.
- Data first hits a write-optimized store. Unlike the case of Vertica, this WOS never is involved in answering queries.
- Asynchronously, the data is broken into columns, and banged to “disk”.
- Asynchronously again, the data is sorted.
- Queries run against sorted data, sorting recent blocks on-the-fly if necessary.
- Interana lets you shard different replicas of the data according to different shard keys.
- Interana is proud of the random sampling it does when serving approximate query results.
Comments
Leave a Reply