RDF and graphs

Analysis of data management technology optimized for RDF-formatted and/or graph data.

June 8, 2010

The most important part of the “social graph” is neither social nor a graph

“Social graph” is a highly misleading term, and so is “social network analysis.” By this I mean:

There’s something akin to “social graphs” and “social network analysis” that is more or less worthy of all the current hype – but graphs and network analysis are only a minor part of the whole story.

In particular, the most important parts of the Facebook “social graph” are neither social nor a graph. Rather, what’s really important is an aggregate Profile of Revealed Preferences, of which person-to-person connections or other things best modeled by a graph play only a small part.

Categories: Analytic technologies, Facebook, Games and virtual worlds, RDF and graphs, Surveillance and privacy, Web analytics

13 Comments

April 8, 2010

Information found in public-facing social networks

Here are some examples illustrating two recent themes of mine, namely:

Easily-available information reveals all sorts of things about us.
Graph-based analysis is on the rise.

Pete Warden scraped all of Facebook’s social graph (at least for the United States), and put up a really interesting-looking visualization of same. Facebook’s lawyer’s came down on him, and he quickly agreed to destroy the data he’d scraped, but also published ideas on how other people could duplicate his work.

Warden has since given an interview in which he outlines some of the things researchers hoped to do with this data: Read more

Categories: Analytic technologies, Facebook, RDF and graphs, Surveillance and privacy

1 Comment

April 5, 2010

Notes on the evolution of OLTP database management systems

The past few years have seen a spate of startups in the analytic DBMS business. Netezza, Vertica, Greenplum, Aster Data and others are all reasonably prosperous, alongside older specialty product vendors Teradata and Sybase (the Sybase IQ part). OLTP (OnLine Transaction Processing) and general purpose DBMS startups, however, have not yet done as well, with such success as there has been (MySQL, Intersystems Cache’, solidDB’s exit, etc.) generally accruing to products that originated in the 20th Century.

Nonetheless, OLTP/general-purpose data management startup activity has recently picked up, targeting what I see as some very real opportunities and needs. So as a jumping-off point for further writing, I thought it might be interesting to collect a few observations about the market in one place. These include:

Big-brand OLTP/general-purpose DBMS have more “stickiness” than analytic DBMS.
By number, most of an enterprise’s OLTP/general-purpose databases are low-volume and low-value.
Most interesting new OLTP/general-purpose data management products are either MySQL-based or NoSQL.
It’s not yet clear whether MySQL will prevail over MySQL forks, or vice-versa, or whether they will co-exist.
The era of silicon-centric relational DBMS is coming.
The emphasis on scale-out and reducing the cost of joins spans the NoSQL and SQL-based worlds.
Users’ instance on “free” could be a major problem for OLTP DBMS innovation.

I shall explain. Read more

Categories: Akiban, Analytic technologies, Business intelligence, Data warehousing, EnterpriseDB and Postgres Plus, Exadata, Market share and customer counts, Memory-centric data management, Mid-range, MySQL, NoSQL, OLTP, Open source, Oracle, PostgreSQL, RDF and graphs, Solid-state memory, VoltDB and H-Store, Web analytics

8 Comments

March 14, 2010

Toward a NoSQL taxonomy

I talked Friday with Dwight Merriman, founder of 10gen (the MongoDB company). He more or less convinced me of his definition of NoSQL systems, which in my adaptation goes:

NoSQL = HVSP (High Volume Simple Processing) without joins or explicit transactions

Within that realm, Dwight offered a two-part taxonomy of NoSQL systems, according to their data model and replication/sharding strategy. I’d be happier, however, with at least three parts to the taxonomy:

How data looks logically on a single node
How data is stored physically on a single node
How data is distributed, replicated, and reconciled across multiple nodes, and whether applications have to be aware of how the data is partitioned among nodes/shards. Read more

Categories: Cassandra, Data models and architecture, NoSQL, Parallelization, RDF and graphs, Structured documents, Theory and architecture

13 Comments

March 12, 2010

Some NoSQL links

I plan to post a few things soon about MongoDB, Cassandra, and NoSQL in general. So I’m poking around a bit reading stuff on the subjects. Here are some links I found. Read more

Categories: Amazon and its cloud, Cassandra, Continuent, Google, MySQL, NoSQL, Open source, RDF and graphs, Tokutek and TokuDB

5 Comments

February 22, 2010

Aster Data nCluster 4.5

Like Vertica, Netezza, and Teradata, Aster is using this week to pre-announce a forthcoming product release, Aster Data nCluster 4.5. Aster is really hanging its identity on “Big Data Analytics” or some variant of that concept, and so the two major named parts of Aster nCluster 4.5 are:

Aster Data Analytic Foundation, a set of analytic packages prebuilt in Aster’s SQL-MapReduce
Aster Data Developer Express, an Eclipse-based IDE (Integrated Development Environment) for developing and testing applications built on Aster nCluster, Aster SQL-MapReduce, and Aster Data Analytic Foundation

And in other Aster news:

Along with the development GUI in Aster nCluster 4.5, there is also a new administrative GUI.
Aster has certified that nCluster works with Fusion I/O boards, because at least one retail industry prospect cares. However, that in no way means that arm’s-length Fusion I/O certification is Aster’s ultimate solid-state memory strategy.
I had the wrong impression about how far Aster/SAS integration has gotten. So far, it’s just at the connector level.

Aster Data Developer Express evidently does some cool stuff, like providing some sort of parallelism testing right on your desktop. It also generates lots of stub code, saving humans from the tedium of doing that. Useful, obviously.

But mainly, I want to write about the analytic packages. Read more

Categories: Aster Data, Data warehousing, Investment research and trading, Predictive modeling and advanced analytics, RDF and graphs, SAS Institute, Teradata

9 Comments

February 1, 2010

Open issues in database and analytic technology

The last part of my New England Database Summit talk was on open issues in database and analytic technology. This was closely intertwined with the previous section, and also relied on a lot that I’ve posted here. So I’ll just put up a few notes on that part, with lots of linkage to prior discussion of the same points. Read more

Categories: Analytic technologies, Business intelligence, Cloud computing, Data warehousing, Presentations, RDF and graphs, Software as a Service (SaaS), Solid-state memory, Theory and architecture

4 Comments

December 2, 2009

Webinar on MapReduce for complex analytics (Thursday, December 3, 10 am and 2 pm Eastern)

The second in my two-webinar series for Aster Data will occur tomorrow, twice (both live), at 10 am and 2 pm Eastern time. The other presenters will be Jonathan Goldman, who was a Principal Scientist at LinkedIn but now has joined Aster himself, and Steve Wooledge of Aster (playing host). Key links are:

Registration for tomorrow’s webinars
Replay of the first webinar
My slides from the first webinar

The main subjects of the webinar will be:

Some review of material from the first webinar (all three presenters)
Discussion of how MapReduce can help with three kinds of analytics:
- Pattern matching (Jonathan will give detail)
- Number-crunching (I’ll cover that, and it will be short)
- Graph analytics (I haven’t written the slides yet, but my starting point will be some of the relationship analytics ideas we discussed in August)

Arguably, aspects of data transformation fit into each of those three categories, which may help explain why data transformation has been so prominent among the early applications of MapReduce.

As you can see from Aster’s title for the webinar (which they picked while I was on vacation), at least their portion will be focused on customer analytics, e.g. web analytics.

Categories: Analytic technologies, Aster Data, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, MapReduce, RDF and graphs, Web analytics

4 Comments

August 21, 2009

Social network analysis, aka relationship analytics

A number of applications lend themselves to graph-oriented analytics, including:

Finding bad guys (national intelligence)
Finding bad guys (anti-fraud)
Data mining the social graph (e.g., for advertising optimization on social networks, or to identify influencers)

There are plenty more graph-oriented applications, of course, such as the identification of biochemical pathways. But I want to focus for now on ones like those on my list. My key points are:

There are Big Data problems that lend themselves to graphical data models.
So far as I can tell, the database management community isn’t doing enough to address them. (If I’m wrong about that, please tell me. I plan to arrive in Lyon for VLDB/XLDB Wednesday of next week, and of course I can always be reached by email.)

Here’s what I mean. Read more

Categories: Analytic technologies, Cogito and 7 Degrees, Data models and architecture, Data types, RDF and graphs, Theory and architecture

22 Comments

April 15, 2009

Cloudera presents the MapReduce bull case

Monday was fire-drill day regarding MapReduce vs. MPP relational DBMS. The upshot was that I was quoted in Computerworld and paraphrased in GigaOm as being a little more negative on MapReduce than I really am, in line with my comment

Frankly, my views on MapReduce are more balanced than [my] weary negativity would seem to imply.

Tuesday afternoon the dial turned a couple notches more positive yet, when I talked with Michael Olson and Jeff Hammerbacher of Cloudera. Cloudera is a new company, built around the open source MapReduce implementation Hadoop. So far Cloudera gives away its Hadoop distribution, without charging for any sort of maintenance or subscription, and just gets revenue from professional services. Presumably, Cloudera plans for this business model to change down the road.

Much of our discussion revolved around Facebook, where Jeff directed a huge and diverse Hadoop effort. Apparently, Hadoop played much of the role of an enterprise data warehouse at Facebook — at least for clickstream/network data — including:

2 1/2 petabytes of data managed via Hadoop
10 terabytes/day of data ingested via Hadoop (Edit: Some of these metrics have been updated in a subsequent post about Facebook.)
Ad targeting queries run every 15 minutes in Hadoop
Dashboard roll-up queries run every hour in Hadoop
Ad-hoc research/analytic Hadoop queries run whenever
Anti-fraud analysis done in Hadoop
Text mining (e.g., of things written on people’s “walls”) done in Hadoop
100s or 1000s of simultaneous Hadoop queries
JSON-based social network analysis in Hadoop

Some Facebook data, however, was put into an Oracle RAC cluster for business intelligence. And Jeff does concede that query execution is slower in Hadoop than in a relational DBMS. Hadoop was also used to build the index for Facebook’s custom text search engine.

Jeff’s reasons for liking Hadoop over relational DBMS at Facebook included: Read more

Categories: Analytic technologies, Cloudera, Data warehousing, EAI, EII, ETL, ELT, ETLT, Facebook, Hadoop, MapReduce, Petabyte-scale data management, RDF and graphs, Specific users, Web analytics

27 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in