RDF and graphs

Analysis of data management technology optimized for RDF-formatted and/or graph data.

June 8, 2010

The most important part of the “social graph” is neither social nor a graph

“Social graph” is a highly misleading term, and so is “social network analysis.” By this I mean:

There’s something akin to “social graphs” and “social network analysis” that is more or less worthy of all the current hype – but graphs and network analysis are only a minor part of the whole story.

In particular, the most important parts of the Facebook “social graph” are neither social nor a graph. Rather, what’s really important is an aggregate Profile of Revealed Preferences, of which person-to-person connections or other things best modeled by a graph play only a small part.

Read more

April 8, 2010

Information found in public-facing social networks

Here are some examples illustrating two recent themes of mine, namely:

Pete Warden scraped all of Facebook’s social graph (at least for the United States), and put up a really interesting-looking visualization of same. Facebook’s lawyer’s came down on him, and he quickly agreed to destroy the data he’d scraped, but also published ideas on how other people could duplicate his work.

Warden has since given an interview in which he outlines some of the things researchers hoped to do with this data: Read more

April 5, 2010

Notes on the evolution of OLTP database management systems

The past few years have seen a spate of startups in the analytic DBMS business. Netezza, Vertica, Greenplum, Aster Data and others are all reasonably prosperous, alongside older specialty product vendors Teradata and Sybase (the Sybase IQ part).  OLTP (OnLine Transaction Processing) and general purpose DBMS startups, however, have not yet done as well, with such success as there has been (MySQL, Intersystems Cache’, solidDB’s exit, etc.) generally accruing to products that originated in the 20th Century.

Nonetheless, OLTP/general-purpose data management startup activity has recently picked up, targeting what I see as some very real opportunities and needs. So as a jumping-off point for further writing, I thought it might be interesting to collect a few observations about the market in one place.  These include:

I shall explain. Read more

March 14, 2010

Toward a NoSQL taxonomy

I talked Friday with Dwight Merriman, founder of 10gen (the MongoDB company). He more or less convinced me of his definition of NoSQL systems, which in my adaptation goes:

NoSQL = HVSP (High Volume Simple Processing) without joins or explicit transactions

Within that realm, Dwight offered a two-part taxonomy of NoSQL systems, according to their data model and replication/sharding strategy. I’d be happier, however, with at least three parts to the taxonomy:

March 12, 2010

Some NoSQL links

I plan to post a few things soon about MongoDB, Cassandra, and NoSQL in general. So I’m poking around a bit reading stuff on the subjects. Here are some links I found. Read more

February 22, 2010

Aster Data nCluster 4.5

Like Vertica, Netezza, and Teradata, Aster is using this week to pre-announce a forthcoming product release, Aster Data nCluster 4.5. Aster is really hanging its identity on “Big Data Analytics” or some variant of that concept, and so the two major named parts of Aster nCluster 4.5 are:

And in other Aster news:

Aster Data Developer Express evidently does some cool stuff, like providing some sort of parallelism testing right on your desktop. It also generates lots of stub code, saving humans from the tedium of doing that. Useful, obviously.

But mainly, I want to write about the analytic packages. Read more

February 1, 2010

Open issues in database and analytic technology

The last part of my New England Database Summit talk was on open issues in database and analytic technology. This was closely intertwined with the previous section, and also relied on a lot that I’ve posted here. So I’ll just put up a few notes on that part, with lots of linkage to prior discussion of the same points. Read more

December 2, 2009

Webinar on MapReduce for complex analytics (Thursday, December 3, 10 am and 2 pm Eastern)

The second in my two-webinar series for Aster Data will occur tomorrow, twice (both live), at 10 am and 2 pm Eastern time. The other presenters will be Jonathan Goldman, who was a Principal Scientist at LinkedIn but now has joined Aster himself, and Steve Wooledge of Aster (playing host). Key links are:

The main subjects of the webinar will be:

Arguably, aspects of data transformation fit into each of those three categories, which may help explain why data transformation has been so prominent among the early applications of MapReduce.

As you can see from Aster’s title for the webinar (which they picked while I was on vacation), at least their portion will be focused on customer analytics, e.g. web analytics.

August 21, 2009

Social network analysis, aka relationship analytics

A number of applications lend themselves to graph-oriented analytics, including:

There are plenty more graph-oriented applications, of course, such as the identification of biochemical pathways. But I want to focus for now on ones like those on my list. My key points are:

Here’s what I mean. Read more

April 15, 2009

Cloudera presents the MapReduce bull case

Monday was fire-drill day regarding MapReduce vs. MPP relational DBMS. The upshot was that I was quoted in Computerworld and paraphrased in GigaOm as being a little more negative on MapReduce than I really am, in line with my comment

Frankly, my views on MapReduce are more balanced than [my] weary negativity would seem to imply.

Tuesday afternoon the dial turned a couple notches more positive yet, when I talked with Michael Olson and Jeff Hammerbacher of Cloudera. Cloudera is a new company, built around the open source MapReduce implementation Hadoop. So far Cloudera gives away its Hadoop distribution, without charging for any sort of maintenance or subscription, and just gets revenue from professional services. Presumably, Cloudera plans for this business model to change down the road.

Much of our discussion revolved around Facebook, where Jeff directed a huge and diverse Hadoop effort. Apparently, Hadoop played much of the role of an enterprise data warehouse at Facebook — at least for clickstream/network data — including:

Some Facebook data, however, was put into an Oracle RAC cluster for business intelligence. And Jeff does concede that query execution is slower in Hadoop than in a relational DBMS. Hadoop was also used to build the index for Facebook’s custom text search engine.

Jeff’s reasons for liking Hadoop over relational DBMS at Facebook included: Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.