Web analytics

Discussion of how data warehousing and analytic technologies are applied to clickstream analysis and other web analytics challenges. Related subjects include:

October 19, 2010

Introduction to Kaminario

At its core, the Kaminario story is simple:

In other words, Kaminario pitches a value proposition something like (my words, not theirs) “A shortcut around your performance bottlenecks.”

*1 million or so on the smallest Kaminario K2 appliance.

Kaminario asserts that both analytics and OLTP (OnLine Transaction Processing) are represented in its user base. Even so, the use cases Kaminario mentioned seemed to be concentrated on the analytic side. I suspect there are two main reasons:

*Somebody can think up a new analytic query overnight that takes 10 times the processing of anything they’ve ever run before. Or they can get the urge to run the same queries 10 times as often as before. Both those kinds of thing happen less often in the OLTP world.

Accordingly, Kaminario likes to sell against the alternative of getting a better analytic DBMS, stressing that you can get a Kaminario K2 appliance into production a lot faster than you can move your processing to even the simplest data warehouse appliance.  Kaminario is probably technically correct in saying that; even so, I suspect it would often make more sense to view Kaminario K2 appliances as a transition technology, by which I mean:

On that basis, I could see Kaminario-like devices eventually getting to the point that every sufficiently large enterprise should have some of them, whether or not that enterprise has an application it believes should run permanently against DRAM block storage.  Read more

October 18, 2010

More notes on Membase and memcached

As a companion to my post about Membase last week, the company has graciously allowed me to post a rather detailed Membase slide deck. (It even has pricing.) Also, I left one point out.

Membase announced a Cloudera partnership. I couldn’t detect anything technically exciting about that, but it serves to highlight what I do find to be an interesting usage trend. A couple of big Web players (AOL and ShareThis) are using Hadoop to crunch data and derive customer profile data, then feed that back into Membase. Why Membase? Because it can serve up the profile in a millisecond, as part of a bigger 40-millisecond-latency request.

And why Hadoop, rather than Aster Data nCluster, which ShareThis also uses? Umm, I didn’t ask.

When I mentioned this to Colin Mahony, he said Vertica had similar stories. However, I don’t recall whether they were about Membase or just memcached, and he hasn’t had a chance to get back to me with clarification.  (Edit: As per Colin’s comment below, it’s both.)

October 3, 2010

Notes and links October 3 2010

Some notes, follow-up, and links before I head out to California:  Read more

September 21, 2010

How to tell whether you need ACID-compliant transaction integrity

In a post about the recent JPMorgan Chase database outage, I suggested that JPMorgan Chase’s user profile database was over-engineered, in that various web surfing data was stored in a fully ACID-compliant manner when it didn’t really need to be. I’ve since gotten private communication expressing vehement agreement, and telling of the opposite choice being major in other major web-facing transactional systems.

What’s going on is this:

Thus, transaction integrity can be more trouble than it’s worth.

In essence, of course, that’s half of the classic NoSQL claim, where the other half of the claim is to assert that the same may be said of joins.

So when should you go for ACID-compliant transaction integrity, and when shouldn’t you bother? Every situation is different, but here’s a set of considerations to start you off.  Read more

August 11, 2010

Big Data is Watching You!

There’s a boom in large-scale analytics. The subjects of this analysis may be categorized as:

The most varied, interesting, and valuable of those four categories is the first one.

Read more

July 1, 2010

Why you should go to XLDB4

Scientific data commonly:

In those respects, it is akin to some of the hottest areas for big data analytics, including:

So when Jacek Becla started the XLDB conferences on the premise that scientific and big data analytic challenges have a lot in common, he had a point. There are several tough database problems that the science-focused folks have taken the leading in thinking about, but which are soon going to matter to the commercial world as well. And that’s one of two big reasons why you should consider participating in XLDB4, October 6-7, at the SLAC facility in Menlo Park, CA, as an attendee, sponsor, or both.

The other big reason is that it is important for the world that XLDB succeed. Read more

June 30, 2010

Cloudera Enterprise and Hadoop evolution

I talked with Cloudera a couple of weeks ago in connection with the impending release of Cloudera Enterprise. I’d say:  Read more

June 8, 2010

The most important part of the “social graph” is neither social nor a graph

“Social graph” is a highly misleading term, and so is “social network analysis.” By this I mean:

There’s something akin to “social graphs” and “social network analysis” that is more or less worthy of all the current hype – but graphs and network analysis are only a minor part of the whole story.

In particular, the most important parts of the Facebook “social graph” are neither social nor a graph. Rather, what’s really important is an aggregate Profile of Revealed Preferences, of which person-to-person connections or other things best modeled by a graph play only a small part.

Read more

May 22, 2010

Notes on SciDB and scientific data management

I firmly believe that, as a community, we should look for ways to support scientific data management and related analytics. That’s why, for example, I went to XLDB3 in Lyon, France at my own expense. Eight months ago, I wrote about issues in scientific data management. Here’s some of what has transpired since then.

The main new activity I know of has been in the open source SciDB project.   Read more

May 4, 2010

Truviso evidently reinvents itself

When Aleri bought Coral8 last year, I wrote that the independent CEP (Complex Event Processing) vendors were floundering. Aleri quickly threw in the towel and sold out to Sybase, which hardly changed my opinion. StreamBase actually is persevering, but not with any kind of breakout success. Big vendors, such as Microsoft and IBM, have at least some aspirations of eventually filling the gap.

Meanwhile, Truviso — which never got much market traction in the first place — was in hiding; Roman Bukary never did keep his promise to brief me on the company’s new and improved strategy. Then Truviso had yet another management change, amidst rumors that it was repositioning away from CEP. As per a press release Truviso emailed today, that’s now official, with Truviso’s main business being something to do with web analytics.

Edit: It seems Truviso was at some point absorbed into Cisco.

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.