September 7, 2014

An idealized log management and analysis system — from whom?

I’ve talked with many companies recently that believe they are:

Focused on building a great data management and analytic stack for log management …
… unlike all the other companies that might be saying the same thing 🙂 …
… and certainly unlike expensive, poorly-scalable Splunk …
… and also unlike less-focused vendors of analytic RDBMS (which are also expensive) and/or Hadoop distributions.

At best, I think such competitive claims are overwrought. Still, it’s a genuinely important subject and opportunity, so let’s consider what a great log management and analysis system might look like.

Much of this discussion could apply to machine-generated data in general. But right now I think more players are doing product management with an explicit conception either of log management or event-series analytics, so for this post I’ll share that focus too.

A short answer might be “Splunk, but with more analytic functionality and more scalable performance, at lower cost, plus numerous coupons for free pizza.” A more constructive and bottoms-up approach might start with:

Agents for any kind of machine that admits streams of data.
Parsers that:
- Immediately identify explicit name-value pairs in popular formats such as JSON or XML.
- Also immediately extract a significant fraction of all implicit fields in text strings — timestamps for sure, but also a lot else. (Splunk is the current gold standard for such capabilities.)
- Allow you to easily write rules for more such extractions.
Immediate indexing in line with everything the parsers do.
Easy import of log files, relational tables, and other relevant data structures.
Queries that can exploit all the indexes, at least up to the functionality level of SQL 2003 analytics (including windowing) and StreamSQL, of course with …
… blazing scalable performance.
Strong workload management and concurrent performance support. (Teradata is the gold standard for such capabilities in the analytic sphere.)
Various other mature-DBMS features, e.g. in backup, manageability, and uptime.

Further, there would be numerous styles of business intelligence interface, at least including:

Generic BI like we generally see for tabular data.
Constantly-changing displays of streaming data.
BI with an event-series orientation.
Strong alerting.
Mobile versions of everything.

And there would be good support for quick-turnaround, easily-operationalized predictive analytics, of the sort that’s fairly central to the visions for Kiji and Spark.

The data management part of that is particularly hard, in that:

Different architectures seem naturally well-suited for different parts of the problem.
Maturing a new data management product is always difficult, costly and slow.

My thoughts on strengths and weaknesses of some obvious log data management contenders start:

Oracle, IBM, and Microsoft have a lot of heft in all things database. But while each of those vendors has great resources and occasionally impressive pieces of new database engineering, none shows much evidence of framing, let alone solving, the problem in the right way(s).
SAP owns Sybase, HANA, several old CEP companies, and Business Objects. Add them to the Oracle/IBM/Microsoft list.
Teradata has a lot going for them. Their core analytic data management strengths are obvious. They’ve owned Aster for a while, and Aster innovated nPath quite some time ago. They recently added Hadapt, a leader in schema-on-need, as well as Revelytix, which has some good ideas in dataset management. Like most other DBMS vendors, however, Teradata doesn’t yet have much of a story for streaming data, and anyhow the most optimistic case for Teradata involves the difficult task of stitching together disparate data management technologies.
HP Vertica has a decent position as well. Probably more proven in general concurrent, scalable performance than others in their peer group (Netezza, Greenplum, et al.), Vertica also was relatively early in innovations relevant to log analysis, including a range of time series/event series features and its own schema-on-need effort. Vertica was also founded by people who were also streaming pioneers (there were heavily overlapping groups of academics behind StreamBase, Vertica and VoltDB), but it’s not clear how that background is reflected in present Vertica product.
Splunk, of course, has a complete stack. At the data acquisition and parsing layers, it’s second to none, and it has a considerable set of log-appropriate BI capabilities as well. And for data management it in effect is stitching together two different inverted-list data stores, plus Hadoop.
Hadoop distribution vendors such as Cloudera, MapR or Hortonworks offer typically bundle a range of relevant capabilities. HDFS (Hadoop Distributed File System) is the default place to dump entire logs. In most distros, Spark offers a new approach to streaming. Impala, Drill and so on offer query. Flume gathers the log data in the first place. But a lot of the cooler capabilities are immature or unproven, and in some cases that’s putting it mildly.

In the interest of length, I’ll omit discussion of smaller vendors, except to say that Platfora’s integrated-stack event series analytics story deserves attention, and I’m disappointed that I never hear about Sumo Logic. And I don’t know a lot about companies positioned as SIEM (Security Information and Event Management), especially now that SenSage has left the scene.

Categories: Aleri and Coral8, Business intelligence, Cloudera, Data models and architecture, Data warehousing, Databricks, Spark and BDAS, Hadapt, Hadoop, Hortonworks, HP and Neoview, IBM and DB2, Log analysis, MapR, Microsoft and SQL*Server, Oracle, Platfora, Predictive modeling and advanced analytics, SAP AG, Schema on need, SenSage, Splunk, Streaming and complex event processing (CEP), Teradata, Vertica Systems, Web analytics, WibiData, Workload management

Subscribe to our complete feed!

Comments

10 Responses to “An idealized log management and analysis system — from whom?”

Örjan on September 7th, 2014 9:03 am

Also Tibco loglogic http://www.tibco.com/products/event-processing/loglogic-for-machine-data

I think it is based on splunk though… (havent looked at it)
Rajesh Nair on September 7th, 2014 9:39 am

What do you think about the ELK stack?
http://www.elasticsearch.org/overview/

-Raj
Joy-Paul Tharakan on September 15th, 2014 12:54 pm

You may like to consider Nexthink from the list of smaller vendors. http://www.nexthink.com/
The WibiWeekly: How To Save Petabytes in Hadoop, Why Customer Service is Dominating Retail & More | Data Wins on September 23rd, 2014 7:57 pm

[…] Industry analyst Curt Monash's overview and evaluation of log management & analysis […]
The WibiWeekly: How To Save Petabytes in Hadoop, Why Customer Service is Dominating Retail & More | WibiData on September 24th, 2014 1:38 pm

[…] Industry analyst Curt Monash's overview and evaluation of log management & analysis […]
Some stuff on my mind, September 28, 2014 | DBMS 2 : DataBase Management System Services on September 28th, 2014 8:21 pm

[…] The ability to mix traditional tabular data, JSON, and log data. […]
Streaming for Hadoop | DBMS 2 : DataBase Management System Services on October 5th, 2014 4:57 am

[…] This also all fits with the importance I place on log analysis. […]
Simone on October 7th, 2014 9:13 am

SenSage is now HawkEye AP owned by Hexis Cyber Solutions, a KEYW company.

HawkEye AP is perfectly positioned to lead the up and coming Security Analytics market. HawkEye AP continues to extend its core Log Management capability, further extending its lead as the world’s most efficient way to collect, store, and analyze mass quantities of Event Data.

HawkEye AP is designed as a complete solution for security analytics with a large scale data warehouse, collection routines to bring in everything from your IT infrastructure, and a built-in reporting module. No single construct means HawkEye AP has virtually unlimited scalability.
Luca Candela on May 19th, 2015 12:33 pm

Your description describes very faithfully Treasure Data except for a couple bullet points that are not well developed yet.
Rob Burton on October 30th, 2017 6:50 am

There is one open source centralized log management software out there which provides scalable performance, it’s called NXLog: https://nxlog.co/products/nxlog-community-edition – it scales well event to thousands or ten thousands of servers while still providing high-performance. And it is a multi platform tool, so it can collect logs from Windwos, Linux, Android, etc. It definitely should be added to the list above.

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

An idealized log management and analysis system — from whom?

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin