Imanis Data
I talked recently with the folks at Imanis Data. For starters:
- The point of Imanis is to make copies of your databases, for purposes such as backup/restore, test/analysis, or compliance-driven archiving. (That’s in declining order of current customer activity.) Another use is migration via restoring to a different cluster than the one that created the data in the first place.
- The data can come from NoSQL database managers, from Hadoop, or from Vertica. (Again, that’s in declining order.)
- As you might imagine, Imanis makes incremental backups; the only full backup is the first one you do for that database.
- “Imanis” is a new name; the previous name was “Talena”.
Also:
- Imanis has ~35 subscription customers, a significant majority of which are in the Fortune 1000.
- Customer industries, in roughly declining order, include:
- Financial services other than insurance.
- Insurance.
- Retail.
- “Technology”.
- ~40% of Imanis customers are in the public cloud.
- Imanis is focused on the North American market at this time.
- Imanis has ~45 employees.
- The Imanis product just hit Version 3.
Imanis correctly observes that there are multiple reasons you might want to recover from backup, including:
- General disaster/system failure.
- Bug in an application that writes data.
- Malicious acts, including encryption-by-ransomware.
Imanis uses the phrase “point-in-time backup” to emphasize its flexibility in letting you choose your favorite time-version of your rolling backup.
Imanis also correctly draws the inference that the right backup strategy is some version of:
- Make backups very frequently. This boils down to “Do a great job of making incremental backups (and restoring from them when necessary). This is where Imanis has spent the bulk of its technical effort to date.
- In case recovery is needed, identify the last clean (or provably/confidently clean) version of the database and restore from that. The identification part boils down to letting the backup databases be queried directly. That’s largely a roadmap item.
- Imanis has recently added the capability to build its own functionality querying the backup database.
- JDBC/whatever general access is still in the future.
Note: When Imanis backups offer direct query access, the possibility will of course exist to use the backup data for general query processing. But while that kind of capability sounds great in theory, I’m not aware of it being a big deal (on technology stacks that already offer it) in practice.
The most technically notable other use cases Imanis mentioned are probably:
- Data science dataset generation. Imanis lets you generate a partial copy of the database for analytic or test purposes.
- You can project, select or sample your data, which suggests use of the current query capabilities.
- There’s an API to let you mask Personally Identifiable Information by writing your own data transformations.
- Archiving/tiering/ILM (Information Lifecycle Management). Imanis lets you divide data according to its hotness.
Imanis views its competition as:
- Native utilities of the data stores.
- Hand-coded scripts.
- Datos.io, principally in the Cassandra market (so far).
Beyond those, the obvious comparison to Imanis is Delphix. I haven’t spoken with Delphix for a few years, but I believe that key differences between Delphix and Imanis start:
- Delphix is focused on widely-installed RDBMS such as Oracle.
- Delphix actually tries to have different production logical copies of your database run off of the same physical copy. Imanis, in contrast, offers technology to help you copy your databases quickly and effectively, but the copies you actually use will indeed be separate from each other.
Imanis software runs on its own cluster, based on hacked Hadoop. A lot of the hacking seems to relate to a metadata store, which supports things like:
- Understanding which (incrementally backed up) blocks need to be pulled together to make a specific copy of the database.
- Putting data in different places for ILM/tiering.
Another piece of Imanis tech is machine-learning-based anomaly detection.
- As incrementally backed-up blocks arrive, Imanis flags anomalous ones and states a reason for them.
- A flag is given a reason.
- You can denounce the flag as a false alert, and hopefully similar flags won’t be raised in the future.
The technology for this seems rather basic:
- Random forests for the flagging.
- No drilldown w/in the Imanis system for follow-up.
But in general concept this is something a lot more systems should be doing.
Most of the rest of Imanis’ tech story is straightforward — support various alternatives for computing platforms, offer the usual security choices, etc. One exception that was new to me was the use of erasure codes, which seem to be a generalization of the concept of parity bits. Allegedly, when used in a storage context these have the near-magical property of offering 4X replication safety with only a 1.5X expansion of data volume. I won’t claim to have understood the subject well enough to see how that could make sense, or what tradeoffs it would entail.
Comments
One Response to “Imanis Data”
Leave a Reply
Erasure coding will be a new Hadoop 3 feature, cf. https://www.slideshare.net/alaleiwang/native-erasure-coding-support-inside-hdfs-presentation for the motivation (as of 2015) and https://www.slideshare.net/HadoopSummit/hdfs-erasure-coding-in-action for the results (as of 2016)