Web analytics

Discussion of how data warehousing and analytic technologies are applied to clickstream analysis and other web analytics challenges. Related subjects include:

The use of analytic technologies for logfile analysis
(in Text Technologies) Online marketing

April 29, 2010

Vertica update

Last month, Vertica’s CEO Ralph Breslauer quit,* and Vertica made it sound like there would be a new CEO late in April. And indeed, as of April 29, there was. He’s a guy I’ve never heard of before named Chris Lynch, apparently quite the sales machine builder. The most substance I’ve found is a pair of Mass High Tech articles — the latter exceedingly typo-ridden — to the general effect that:

Vertica plans to build a massive, world-conquering sales force.
If Vertica dips back into negative cash flow to do that and has to raise more venture capital, so be it.
“Triple-digit” revenue growth is expected for this year.

1 Comment

April 8, 2010

Examples of machine-generated data

Not long ago I pointed out that much future Big Data growth will be in the area of machine-generated data, examples of which include: Read more

Categories: Analytic technologies, Data warehousing, Games and virtual worlds, Investment research and trading, Log analysis, Oracle, Telecommunications, Web analytics

28 Comments

April 5, 2010

Notes on the evolution of OLTP database management systems

The past few years have seen a spate of startups in the analytic DBMS business. Netezza, Vertica, Greenplum, Aster Data and others are all reasonably prosperous, alongside older specialty product vendors Teradata and Sybase (the Sybase IQ part). OLTP (OnLine Transaction Processing) and general purpose DBMS startups, however, have not yet done as well, with such success as there has been (MySQL, Intersystems Cache’, solidDB’s exit, etc.) generally accruing to products that originated in the 20th Century.

Nonetheless, OLTP/general-purpose data management startup activity has recently picked up, targeting what I see as some very real opportunities and needs. So as a jumping-off point for further writing, I thought it might be interesting to collect a few observations about the market in one place. These include:

Big-brand OLTP/general-purpose DBMS have more “stickiness” than analytic DBMS.
By number, most of an enterprise’s OLTP/general-purpose databases are low-volume and low-value.
Most interesting new OLTP/general-purpose data management products are either MySQL-based or NoSQL.
It’s not yet clear whether MySQL will prevail over MySQL forks, or vice-versa, or whether they will co-exist.
The era of silicon-centric relational DBMS is coming.
The emphasis on scale-out and reducing the cost of joins spans the NoSQL and SQL-based worlds.
Users’ instance on “free” could be a major problem for OLTP DBMS innovation.

I shall explain. Read more

Categories: Akiban, Analytic technologies, Business intelligence, Data warehousing, EnterpriseDB and Postgres Plus, Exadata, Market share and customer counts, Memory-centric data management, Mid-range, MySQL, NoSQL, OLTP, Open source, Oracle, PostgreSQL, RDF and graphs, Solid-state memory, VoltDB and H-Store, Web analytics

8 Comments

April 4, 2010

The retention of everything

I’d like to reemphasize a point I’ve been making for a while about data retention: Read more

Categories: Archiving and information preservation, Surveillance and privacy, Web analytics

3 Comments

March 19, 2010

Infobright blog update

I often offer that, if a company puts up a sufficiently good blog post, I’ll link to it. Well, I just noticed that Infobright CEO Mark Burton (somewhere along the way he seems to have dropped the “interim”) put up an excellent post last month.

Highlights on the market share/sector side include: Read more

Categories: Columnar database management, Data mart outsourcing, Data warehousing, Infobright, Log analysis, Market share and customer counts, Open source, Web analytics

1 Comment

January 17, 2010

Three broad categories of data

People often try to draw a distinction between:

Traditional data of the sort that’s stored in relational databases, aka “structured.”
Everything else, aka “unstructured” or “semi-structured” or “complex.”

There are plenty of problems with these formulations, not the least of which is that the supposedly “unstructured” data is the kind that actually tends to have interesting internal structures. But of the many reasons why these distinctions don’t tend to work very well, I think the most important one is that:

Databases shouldn’t be divided into just two categories. Even as a rough-cut approximation, they should be divided into three, namely:

Human/Tabular data –i.e., human-generated data that fits well into relational tables or arrays
Human/Nontabular data — i.e., all other data generated by humans
Machine-Generated data

Even that trichotomy is grossly oversimplified, for reasons such as:

These categories overlap.
There are kinds of data that get into fuzzy border zones.
Not all data in each category has all the same properties.

But at least as a starting point, I think this basic categorization has some value. Read more

Categories: Database diversity, Investment research and trading, Log analysis, Telecommunications, Web analytics

19 Comments

December 7, 2009

A framework for thinking about data warehouse growth

There are only three ways that the amount of data stored in data warehouses can grow:

The same kinds of data are stored as before, with more being added over time.
The same kinds of data are stored as before, but in more detail.
New kinds of data are stored.

Categories: Analytic technologies, Application areas, Data warehousing, Investment research and trading, Log analysis, Solid-state memory, Storage, Telecommunications, Text, Web analytics

9 Comments

December 2, 2009

Webinar on MapReduce for complex analytics (Thursday, December 3, 10 am and 2 pm Eastern)

The second in my two-webinar series for Aster Data will occur tomorrow, twice (both live), at 10 am and 2 pm Eastern time. The other presenters will be Jonathan Goldman, who was a Principal Scientist at LinkedIn but now has joined Aster himself, and Steve Wooledge of Aster (playing host). Key links are:

Registration for tomorrow’s webinars
Replay of the first webinar
My slides from the first webinar

The main subjects of the webinar will be:

Some review of material from the first webinar (all three presenters)
Discussion of how MapReduce can help with three kinds of analytics:
- Pattern matching (Jonathan will give detail)
- Number-crunching (I’ll cover that, and it will be short)
- Graph analytics (I haven’t written the slides yet, but my starting point will be some of the relationship analytics ideas we discussed in August)

Arguably, aspects of data transformation fit into each of those three categories, which may help explain why data transformation has been so prominent among the early applications of MapReduce.

As you can see from Aster’s title for the webinar (which they picked while I was on vacation), at least their portion will be focused on customer analytics, e.g. web analytics.

Categories: Analytic technologies, Aster Data, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, MapReduce, RDF and graphs, Web analytics

4 Comments

November 23, 2009

Boston Big Data Summit keynote outline

Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I’m posting them below.

Categories: Analytic technologies, Archiving and information preservation, Business intelligence, Cloud computing, Clustering, Columnar database management, Data warehouse appliances, Data warehousing, DBMS product categories, Humor, Investment research and trading, Log analysis, MapReduce, Market share and customer counts, NoSQL, OLTP, Open source, Parallelization, Presentations, Pricing, Solid-state memory, Storage, Telecommunications, Theory and architecture, Web analytics

6 Comments

October 18, 2009

Three big myths about MapReduce

Once again, I find myself writing and talking a lot about MapReduce. But I suspect that MapReduce-related conversations would go better if we overcame three fairly common MapReduce myths:

MapReduce is something very new
MapReduce involves strict adherence to the Map-Reduce programming paradigm
MapReduce is a single technology

Categories: Analytic technologies, Aster Data, Cloudera, Data warehousing, Google, Greenplum, Hadoop, Log analysis, MapReduce, Michael Stonebraker, Parallelization, Web analytics

11 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in