Web analytics

Discussion of how data warehousing and analytic technologies are applied to clickstream analysis and other web analytics challenges. Related subjects include:

October 18, 2009

Technical introduction to Splunk

As noted in my other introductory post, Splunk sells software called Splunk, which is used for log analysis. These can be logs of various kinds, but for the purpose of understanding Splunk technology, it’s probably OK to assume they’re clickstream/network event logs. In addition, Splunk seems to have some aspirations of having its software used for general schema-free analytics, but that’s in early days at best.

Splunk’s core technology indexes text and XML files or streams, especially log files. Technical highlights of that part include: Read more

October 18, 2009

General introduction to Splunk

I dropped by log analysis software vendor Splunk a few weeks ago for a chat with Marketing VP Steve Sommer (who some you may know from Cognos and/or Informix), Product Management VP Christina Noren, and above all co-founder/CTO Erik Swan. Splunk turns out to be a pretty interesting company, from both business and technical standpoints. For one thing, Splunk seems highly regarded by most people I mention it to.

Splunk’s technical stories include:

More on those in a separate post.

Less technical Splunk highlights include: Read more

October 14, 2009

Infobright notes

I had lunch w/ Bob Zurek and Susan Davis of Infobright today. This wasn’t primarily a briefing, but a few takeaways are:

October 10, 2009

How 30+ enterprises are using Hadoop

MapReduce is definitely gaining traction, especially but by no means only in the form of Hadoop. In the aftermath of Hadoop World, Jeff Hammerbacher of Cloudera walked me quickly through 25 customers he pulled from Cloudera’s files. Facts and metrics ranged widely, of course:

Read more

October 3, 2009

Issues in scientific data management

In the opinion of the leaders of the XLDB and SciDB efforts, key requirements for scientific data management include:

However: Read more

October 1, 2009

Yahoo wants to do decapetabyte-scale data warehousing in Hadoop

My old client Mark Tsimelzon moved over to Yahoo after Coral8 was acquired, and I caught up with him last month. He turns out to be running development for a significant portion of Yahoo’s Hadoop effort — everything other than HDFS (Hadoop Distributed File System). Yahoo evidently plans to, within a year or so, get Hadoop to the point that it is managing 10s of petabytes of data for Yahoo, with reasonable data warehousing functionality.

Highlights of our visit included:

Read more

September 19, 2009

Oracle gives a few customer database size examples

In its recent quarterly conference call, Oracle said (as per the Seeking Alpha transcript):

AC Neilsen, for instance, we deployed a 45-terabyte data [mart], they called it; Adidas, 13 terabytes; Australian Bureau of Statistics, 250 terabytes; and of course, some of our high-end ones that you have probably heard of in the past, AT&T, 250 terabytes; Yahoo!, 700 terabytes — just gives you an idea of the size of the databases that are out there and how they are growing, and that’s driving the need for greater throughput.

I don’t know what’s being counted there, but I wouldn’t be surprised if those were legit user-data figures.

Some other notes:

August 4, 2009

Vertica’s version of MapReduce integration

I talked with Omer Trajman of Vertica Monday night about Vertica’s MapReduce integration, part of its Vertica 3.5 release. Highlights included:

Apparently, the use cases for Vertica/Hadoop integration to date lie in algorithmic trading and two kinds of web analytics. Specifically: Read more

July 16, 2009

Vertica customer notes

Dave Menninger of Vertica called to discuss NDA product futures, as vendors tend to do in the weeks before a TDWI conference. So we also talked a bit about the Vertica customer base.  That’s listed as 86 at the end of Q2, up from 74 in Q1. That’s pretty small growth compared with Q1, which Dave didn’t fully explain. But then, off the top of his head, he was recalling Q1 numbers as being lower than that 74, so maybe there’s a reporting glitch in the loop somewhere.

Vertica’s two biggest customer segments are telecommunications and financial services, and Dave drew an interesting distinction between what the two groups care about. Telecom companies care about data warehouses that are big and 24/7 reliable, but don’t do particularly complex analytics. Financial services — by which he presumably means mainly proprietary traders — are most focused on complex and competitively innovative analytics.

Also mentioned in various contexts were web-based outfits such as data mart outsourcers, social networkers, and open-source software providers.

Vertica also offers customer win stories in other segments, but most actual discussion about what Vertica does revolves around the application areas mentioned above, just as it has been in the past.

Similar (not necessarily identical) generalizations would be true of many other analytic DBMS vendors.

July 6, 2009

Yahoo is up to 10 petabytes now?

According to somebody (I forget who) who attended Yahoo’s SIGMOD presentation last week, the big Yahoo database is now up to 10 petabytes in size, in line with Yahoo’s predictions last year.  Apparently, Yahoo also gave more details of how the technology works.

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.