Web analytics

Discussion of how data warehousing and analytic technologies are applied to clickstream analysis and other web analytics challenges. Related subjects include:

The use of analytic technologies for logfile analysis
(in Text Technologies) Online marketing

October 19, 2010

Introduction to Kaminario

At its core, the Kaminario story is simple:

Throw out your disks and replace them with, not Flash, but actual DRAM.
Your IOPS (Input/Output Per Second) are so high* that you get the performance you need without any further system changes.
The whole thing is very fast to set up.

In other words, Kaminario pitches a value proposition something like (my words, not theirs) “A shortcut around your performance bottlenecks.”

*1 million or so on the smallest Kaminario K2 appliance.

Kaminario asserts that both analytics and OLTP (OnLine Transaction Processing) are represented in its user base. Even so, the use cases Kaminario mentioned seemed to be concentrated on the analytic side. I suspect there are two main reasons:

As Kaminario points out, OLTP apps commonly are designed to perform in the face of regrettable I/O wait.
Also, analytic performance problems tend to arise more suddenly than OLTP ones do.*

*Somebody can think up a new analytic query overnight that takes 10 times the processing of anything they’ve ever run before. Or they can get the urge to run the same queries 10 times as often as before. Both those kinds of thing happen less often in the OLTP world.

Accordingly, Kaminario likes to sell against the alternative of getting a better analytic DBMS, stressing that you can get a Kaminario K2 appliance into production a lot faster than you can move your processing to even the simplest data warehouse appliance. Kaminario is probably technically correct in saying that; even so, I suspect it would often make more sense to view Kaminario K2 appliances as a transition technology, by which I mean:

You have an annoying performance problem.
Kaminario K2 could solve it very quickly.
That buys you time for a more substantive fix.*
If you want, you can redeploy your Kaminario K2 storage to solve your next-worst performance bottleneck.

On that basis, I could see Kaminario-like devices eventually getting to the point that every sufficiently large enterprise should have some of them, whether or not that enterprise has an application it believes should run permanently against DRAM block storage. Read more

Categories: Investment research and trading, Kaminario, Solid-state memory, Storage, Telecommunications, Web analytics

7 Comments

October 18, 2010

More notes on Membase and memcached

As a companion to my post about Membase last week, the company has graciously allowed me to post a rather detailed Membase slide deck. (It even has pricing.) Also, I left one point out.

Membase announced a Cloudera partnership. I couldn’t detect anything technically exciting about that, but it serves to highlight what I do find to be an interesting usage trend. A couple of big Web players (AOL and ShareThis) are using Hadoop to crunch data and derive customer profile data, then feed that back into Membase. Why Membase? Because it can serve up the profile in a millisecond, as part of a bigger 40-millisecond-latency request.

And why Hadoop, rather than Aster Data nCluster, which ShareThis also uses? Umm, I didn’t ask.

When I mentioned this to Colin Mahony, he said Vertica had similar stories. However, I don’t recall whether they were about Membase or just memcached, and he hasn’t had a chance to get back to me with clarification. (Edit: As per Colin’s comment below, it’s both.)

Categories: Aster Data, Cache, Cloudera, Couchbase, Hadoop, memcached, Memory-centric data management, NoSQL, Pricing, Specific users, Vertica Systems, Web analytics

7 Comments

October 3, 2010

Notes and links October 3 2010

Some notes, follow-up, and links before I head out to California: Read more

Categories: GIS and geospatial, Google, HP and Neoview, Humor, Kickfire, Netezza, Solid-state memory, Teradata, Web analytics

3 Comments

September 21, 2010

How to tell whether you need ACID-compliant transaction integrity

In a post about the recent JPMorgan Chase database outage, I suggested that JPMorgan Chase’s user profile database was over-engineered, in that various web surfing data was stored in a fully ACID-compliant manner when it didn’t really need to be. I’ve since gotten private communication expressing vehement agreement, and telling of the opposite choice being major in other major web-facing transactional systems.

What’s going on is this:

ACID-compliant transaction integrity commonly costs more in terms of DBMS licenses and many other components of TCO (Total Cost of Ownership) than less rigorous approaches.
Worse, it can actually hurt application uptime, by forcing your system to pull in its horns and stop functioning in the face of failures that a non-transactional system might smoothly work around.
Other flavors of “complexity can be a bad thing” apply as well.

Thus, transaction integrity can be more trouble than it’s worth.

In essence, of course, that’s half of the classic NoSQL claim, where the other half of the claim is to assert that the same may be said of joins.

So when should you go for ACID-compliant transaction integrity, and when shouldn’t you bother? Every situation is different, but here’s a set of considerations to start you off. Read more

Categories: NoSQL, Web analytics

12 Comments

August 11, 2010

Big Data is Watching You!

There’s a boom in large-scale analytics. The subjects of this analysis may be categorized as:

People
Financial trades
Electronic networks
Everything else

The most varied, interesting, and valuable of those four categories is the first one.

Categories: Aster Data, Data warehousing, Investment research and trading, Log analysis, MapReduce, Predictive modeling and advanced analytics, RDF and graphs, Specific users, Surveillance and privacy, Telecommunications, Web analytics

6 Comments

July 1, 2010

Why you should go to XLDB4

Scientific data commonly:

Comes in large volumes
Is machine-generated
Is augmented by synthetic and/or derived data
Has a spatial and/or temporal structure

In those respects, it is akin to some of the hottest areas for big data analytics, including:

Investment trade data – big, partly machine generated, augmented (often), temporal
Web/network log data – big, machine-generated, post-processed into derived form, temporal
Marketing analytic data – big, post-processed into derived form
Genomic data

So when Jacek Becla started the XLDB conferences on the premise that scientific and big data analytic challenges have a lot in common, he had a point. There are several tough database problems that the science-focused folks have taken the leading in thinking about, but which are soon going to matter to the commercial world as well. And that’s one of two big reasons why you should consider participating in XLDB4, October 6-7, at the SLAC facility in Menlo Park, CA, as an attendee, sponsor, or both.

The other big reason is that it is important for the world that XLDB succeed. Read more

Categories: Investment research and trading, Log analysis, Scientific research, Web analytics

2 Comments

June 30, 2010

Cloudera Enterprise and Hadoop evolution

I talked with Cloudera a couple of weeks ago in connection with the impending release of Cloudera Enterprise. I’d say: Read more

Categories: Cloudera, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, eBay, Hadoop, Investment research and trading, MapReduce, Market share and customer counts, Petabyte-scale data management, Pricing, Specific users, Web analytics

7 Comments

June 8, 2010

The most important part of the “social graph” is neither social nor a graph

“Social graph” is a highly misleading term, and so is “social network analysis.” By this I mean:

There’s something akin to “social graphs” and “social network analysis” that is more or less worthy of all the current hype – but graphs and network analysis are only a minor part of the whole story.

In particular, the most important parts of the Facebook “social graph” are neither social nor a graph. Rather, what’s really important is an aggregate Profile of Revealed Preferences, of which person-to-person connections or other things best modeled by a graph play only a small part.

Categories: Analytic technologies, Facebook, Games and virtual worlds, RDF and graphs, Surveillance and privacy, Web analytics

13 Comments

May 22, 2010

Notes on SciDB and scientific data management

I firmly believe that, as a community, we should look for ways to support scientific data management and related analytics. That’s why, for example, I went to XLDB3 in Lyon, France at my own expense. Eight months ago, I wrote about issues in scientific data management. Here’s some of what has transpired since then.

The main new activity I know of has been in the open source SciDB project. Read more

Categories: Analytic technologies, Data warehousing, eBay, GIS and geospatial, Microsoft and SQL*Server, SciDB, Scientific research, Web analytics

5 Comments

May 4, 2010

Truviso evidently reinvents itself

When Aleri bought Coral8 last year, I wrote that the independent CEP (Complex Event Processing) vendors were floundering. Aleri quickly threw in the towel and sold out to Sybase, which hardly changed my opinion. StreamBase actually is persevering, but not with any kind of breakout success. Big vendors, such as Microsoft and IBM, have at least some aspirations of eventually filling the gap.

Meanwhile, Truviso — which never got much market traction in the first place — was in hiding; Roman Bukary never did keep his promise to brief me on the company’s new and improved strategy. Then Truviso had yet another management change, amidst rumors that it was repositioning away from CEP. As per a press release Truviso emailed today, that’s now official, with Truviso’s main business being something to do with web analytics.

Edit: It seems Truviso was at some point absorbed into Cisco.

Categories: Streaming and complex event processing (CEP), Truviso, Web analytics

8 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Web analytics

Introduction to Kaminario

More notes on Membase and memcached

Notes and links October 3 2010

How to tell whether you need ACID-compliant transaction integrity

Big Data is Watching You!

Why you should go to XLDB4

Cloudera Enterprise and Hadoop evolution

The most important part of the “social graph” is neither social nor a graph

Notes on SciDB and scientific data management

Truviso evidently reinvents itself

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin