Log analysis

Discussion of how data warehousing and analytic technologies are applied to logfile analysis. Related subjects include:

The use of analytic technologies to study web and network event data

April 8, 2010

Examples of machine-generated data

Not long ago I pointed out that much future Big Data growth will be in the area of machine-generated data, examples of which include: Read more

Categories: Analytic technologies, Data warehousing, Games and virtual worlds, Investment research and trading, Log analysis, Oracle, Telecommunications, Web analytics

29 Comments

March 19, 2010

Infobright blog update

I often offer that, if a company puts up a sufficiently good blog post, I’ll link to it. Well, I just noticed that Infobright CEO Mark Burton (somewhere along the way he seems to have dropped the “interim”) put up an excellent post last month.

Highlights on the market share/sector side include: Read more

Categories: Columnar database management, Data mart outsourcing, Data warehousing, Infobright, Log analysis, Market share and customer counts, Open source, Web analytics

1 Comment

January 17, 2010

Three broad categories of data

People often try to draw a distinction between:

Traditional data of the sort that’s stored in relational databases, aka “structured.”
Everything else, aka “unstructured” or “semi-structured” or “complex.”

There are plenty of problems with these formulations, not the least of which is that the supposedly “unstructured” data is the kind that actually tends to have interesting internal structures. But of the many reasons why these distinctions don’t tend to work very well, I think the most important one is that:

Databases shouldn’t be divided into just two categories. Even as a rough-cut approximation, they should be divided into three, namely:

Human/Tabular data –i.e., human-generated data that fits well into relational tables or arrays
Human/Nontabular data — i.e., all other data generated by humans
Machine-Generated data

Even that trichotomy is grossly oversimplified, for reasons such as:

These categories overlap.
There are kinds of data that get into fuzzy border zones.
Not all data in each category has all the same properties.

But at least as a starting point, I think this basic categorization has some value. Read more

Categories: Database diversity, Investment research and trading, Log analysis, Telecommunications, Web analytics

19 Comments

December 7, 2009

A framework for thinking about data warehouse growth

There are only three ways that the amount of data stored in data warehouses can grow:

The same kinds of data are stored as before, with more being added over time.
The same kinds of data are stored as before, but in more detail.
New kinds of data are stored.

Categories: Analytic technologies, Application areas, Data warehousing, Investment research and trading, Log analysis, Solid-state memory, Storage, Telecommunications, Text, Web analytics

9 Comments

November 23, 2009

Boston Big Data Summit keynote outline

Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I’m posting them below.

Categories: Analytic technologies, Archiving and information preservation, Business intelligence, Cloud computing, Clustering, Columnar database management, Data warehouse appliances, Data warehousing, DBMS product categories, Humor, Investment research and trading, Log analysis, MapReduce, Market share and customer counts, NoSQL, OLTP, Open source, Parallelization, Presentations, Pricing, Solid-state memory, Storage, Telecommunications, Theory and architecture, Web analytics

6 Comments

October 18, 2009

Three big myths about MapReduce

Once again, I find myself writing and talking a lot about MapReduce. But I suspect that MapReduce-related conversations would go better if we overcame three fairly common MapReduce myths:

MapReduce is something very new
MapReduce involves strict adherence to the Map-Reduce programming paradigm
MapReduce is a single technology

Categories: Analytic technologies, Aster Data, Cloudera, Data warehousing, Google, Greenplum, Hadoop, Log analysis, MapReduce, Michael Stonebraker, Parallelization, Web analytics

11 Comments

October 18, 2009

Introduction to SenSage

I visited with SenSage on my two most recent trips to San Francisco. Both visits were, through no fault of SenSage’s, hasty. Still, I think I have enough of a handle on SenSage basics to be worth writing up.

General SenSage highlights include:

Categories: Analytic technologies, Columnar database management, Data warehousing, Database compression, Log analysis, MapReduce, SenSage, Streaming and complex event processing (CEP), Telecommunications

3 Comments

October 18, 2009

Technical introduction to Splunk

As noted in my other introductory post, Splunk sells software called Splunk, which is used for log analysis. These can be logs of various kinds, but for the purpose of understanding Splunk technology, it’s probably OK to assume they’re clickstream/network event logs. In addition, Splunk seems to have some aspirations of having its software used for general schema-free analytics, but that’s in early days at best.

Splunk’s core technology indexes text and XML files or streams, especially log files. Technical highlights of that part include: Read more

Categories: Analytic technologies, Log analysis, MapReduce, Splunk, Structured documents, Text, Web analytics

12 Comments

October 18, 2009

General introduction to Splunk

I dropped by log analysis software vendor Splunk a few weeks ago for a chat with Marketing VP Steve Sommer (who some you may know from Cognos and/or Informix), Product Management VP Christina Noren, and above all co-founder/CTO Erik Swan. Splunk turns out to be a pretty interesting company, from both business and technical standpoints. For one thing, Splunk seems highly regarded by most people I mention it to.

Splunk’s technical stories include:

Text search over log files.
Business intelligence over text search. (That part sounds a lot like Attivio.)
MapReduce with schema flexibility and smart multi-stage execution plans. (That part sounds a lot like Aster Data.)

Infobright notes

I had lunch w/ Bob Zurek and Susan Davis of Infobright today. This wasn’t primarily a briefing, but a few takeaways are:

Infobright now has >100 paying customers.
Typical database size is from the low 100s of gigabytes to the low single-digit number of terabytes.
Agile development is at or approaching two-week release cycles.
Like Kickfire, Infobright has a multi-year deal with MySQL that insulates it against many potential Oracle/MySQL shenanigans.
From an industry perspective, Infobright’s customer base sounds a lot like other vendors’:
- Data mart outsourcing/online analytics
- Log files for websites
- Telecommunications
- Financial services
- OEM, especially in the markets cited above
- “Hey, we’re beginning to see the occasional energy deal”
- A few random others
Infobright is seeing some household-name customers, who surely have big-name analytic DBMS products, but who also have a policy that open source is the default choice, and if open source can get the job done then the favorite closed-source choices aren’t used.
Infobright has the usual open-source community story — lots of involvement and engagement in the forums, but contributions are limited mainly to connectivity, utility scripts, etc. (Maybe some national language translation too; I’m not sure.)

Categories: Analytic technologies, Data mart outsourcing, Data warehousing, Infobright, Investment research and trading, Kickfire, Log analysis, Market share and customer counts, MySQL, Open source, Telecommunications, Web analytics

7 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Log analysis

Examples of machine-generated data

Infobright blog update

Three broad categories of data

A framework for thinking about data warehouse growth

Boston Big Data Summit keynote outline

Three big myths about MapReduce

Introduction to SenSage

Technical introduction to Splunk

General introduction to Splunk

Infobright notes

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin