Sumo Logic and UIs for text-oriented data
I talked with the Sumo Logic folks for an hour Thursday. Highlights included:
- Sumo Logic does SaaS (Software as a Service) log management.
- Sumo Logic is text indexing/Lucene-based. Thus, it is reasonable to think of Sumo Logic as “Splunk-like”. (However, Sumo Logic seems to have a stricter security/trouble-shooting orientation than Splunk, which is trying to branch out.)
- Sumo Logic has hacked Lucene for faster indexing, and says 10-30 second latencies are typical.
- Sumo Logic’s main differentiation is automated classification of events.
- There’s some kind of streaming engine in the mix, to update counters and drive alerts.
- Sumo Logic has around 30 “customers,” free (mainly) or paying (around 5) as the case may be.
- A truly typical Sumo Logic customer has single to low double digits of gigabytes of log data per day. However, Sumo Logic seems highly confident in its ability to handle a terabyte per customer per day, give or take a factor of 2.
- When I asked about the implications of shipping that much data to a remote data center, Sumo Logic observed that log data compresses really well.
- Sumo Logic recently raised a bunch of venture capital.
- Sumo Logic’s founders are out of ArcSight, a log management company HP paid a bunch of money for.
- Sumo Logic coined a marketing term “LogReduce”, but it has nothing to do with “MapReduce”. Sumo Logic seems to find this amusing.
What interests me about Sumo Logic is that automated classification story. I thought I heard Sumo Logic say:
- It’s largely unsupervised machine learning.
- It’s specific to a particular user/data set.
- It can be up and running and classifying things effectively almost instantly (i.e., on seconds’ or minutes’ worth of data).
- It’s informed by what different users tag as false positives. (Or maybe that is planned for future versions.)
I have a little trouble seeing how all those points fit exactly together, so perhaps I got some details wrong.
The payoff is that machine learning directly informs the Sumo Logic user interface. In particular, large numbers of events are bundled into a small number of categories, hopefully making it much easier for network operations types to scan the UI and pick out what’s important.
In general, the idea of machine-learning informing analytic UIs via some sort of classification is common in text-oriented technologies, notably in:
- Good ol’ text search.
- Text mining vendors’ approaches to clustering hits on words or phrases that say substantially the same thing.
But otherwise it seems kind of rare, if we stipulate that ad-serving/general internet personalization isn’t really an analytic UI — but I’d love to hear of any interesting examples I’ve overlooked.
Comments
7 Responses to “Sumo Logic and UIs for text-oriented data”
Leave a Reply
Curt,
What is the unit for “single to low double digits of log data per day”? Is it GB?
Jim,
Ack! Thanks! Yes! Fixed.
[…] analyzer Sumo Logic probably doesn’t rely on an off-the-shelf machine learning […]
“Sumo Logic’s main differentiation is automated classification of events.”
– Is this a comparison to Splunk?
How else does it differ? More dependence on machine learning techniques?
I haven’t talked with Sumo Logic for a while. Their last PR pitch was a generic “Golly gee whiz big data SaaS cloud” piece of nonsense; if they actually enhanced the offering in interesting ways, they did a good job of covering it up.
If you’re interested in Sumo you can always contact them directly 🙂 Sumo has more differentiators on their backend–“elastic log processing” for scaling without performance implications, machine learning and native anomaly detection technology, dashboards which run off of continuous queries for auto-updating, etc. The cloud marketing “nonsense” has room for improvement 🙂
[…] are apt to backfire instead. Splunk seems to actually have had some limited success intimidating Sumo Logic. But it tried something similar against Rocana, and I was set up to potentially be collateral […]