May 4, 2012

Notes on graph data management

This post is part of a series on managing and analyzing graph data. Posts to date include:

Interest in graph data models keeps increasing. But it’s tough to discuss them with any generality, because “graph data model” encompasses so many different things. Indeed, just as all data structures can be mapped to relational ones, it is also the case that all data structures can be mapped to graphs.

Formally, a graph is a collection of (node, edge, node) triples. In the simplest case, the edge has no properties other than existence or maybe direction, and the triple can be reduced to a (node, node) pair, unordered or ordered as the case may be. It is common, however, for edges to encapsulate additional properties, the canonical examples of which are:

Weight. Usually, the intuition here is that the weight is a number indicating the strength of the connection. This is generally derived from more basic data.
Kind. The edge can encapsulate one or more descriptors indicating the kind of relationship between the nodes.

Many of the graph examples I can think of fit into four groups:

Networks of people, aka social networks. Three (overlapping) areas of particular importance are:
- People/communications.
  - One canonical example is influencer-finding in telecommunications customer bases. The nodes are subscribers; the edges are call details (raw or aggregated).
  - Other examples may be found as subgraphs of our next category, namely …
- … people/places/things.
  - This is the classic structure for anti-terrorism, law enforcement, or anti-fraud use cases.
  - Nodes are people, buildings/addresses, cars, businesses, etc. , except that …
  - … nodes can actually be ordered pairs (tangible thing, timestamp). After all, it’s more interesting if two people were, not just in the same place, but in the same place at the same time.
- People/connections/recommendations.
  - Similarly, there are use cases in which various people have social network connections, and then also recommend products of some kind.
  - Edges can carry information about the evident strength of the social network connection …
  - … but also about apparent similarities in taste.
Graphs of IT objects. Various sets of conceptual IT objects can be viewed as graphs. For example:
- I visited Workday recently. They refer to their Java object model as a “graph.”
- Neo Technology (the Neo4j guys) started out doing a content management system, and eventually decided that what they really wanted underneath it was a graph-oriented DBMS.
- Now one of Neo’s major application areas is MDM (Master Data Management). Edit: I’m not wholly persuaded by the way they use that term.
- Most dramatically, there’s Tim Berners-Lee’s “Semantic Web”, which is built on RDF, which models things as “a directed, labeled graph”. SPARQL, OWL and so on are in the mix as well. To date, the Semantic Web has been a lot of hot air, only without the hot aspect; still, it’s obviously influenced many people’s thinking about graphs.
- Edit: Please see Marie’s comment below for a rather major example I left out. 🙂
Taxonomies, ontologies, and/or semantic networks.
- To a large extent this overlaps with my previous category …
- … but I’m particularly fond of the example of straightforward taxonomies of words, e.g. WordNet. The nodes are the words themselves, or more precisely word senses (i.e., specific meanings of a word); edges are typically chosen from a limited set of alternatives such as is_a, is_part_of, or entails.
Finally, there are representations of physical graphs. Examples might include telecom networks, utility grids, or locations and routes for physical deliveries.

My main reason for reciting these diverse examples is to illustrate that, for any really interesting technical discussion, it is necessary to focus on a subset of the possible use cases.

Categories: Neo Technology and Neo4j, RDF and graphs, Telecommunications, Workday

Subscribe to our complete feed!

Comments

10 Responses to “Notes on graph data management”

Andrey on May 4th, 2012 8:06 am

http://www-01.ibm.com/software/data/db2/linux-unix-windows/graph-store.html
Marie on May 4th, 2012 3:43 pm

Nice post. Looking forward to the follow-ups.

Incidentally, I’m surprised you forgot the classic WWW graph in your examples 🙂
Scott Meyer on May 4th, 2012 8:04 pm

I don’t think you need to focus on a particular subset. In a graph database:

1. All relations are first class (represented equally).
2. Navigation can be done in constant time
3. Schema change can be done in constant time

The examples that you cite all benefit from these qualities in more-or-less obvious ways.
Curt Monash on May 7th, 2012 4:29 am

Marie,

Thanks! Post edited in response.
Curt Monash on May 7th, 2012 4:31 am

Scott,

Interesting points, but I’m not clear on what you mean by those “constant time” remarks. For example, I would expect simple forms of navigation to be logarithmic or so — I guess that might depend on the physical data model — in the number of edges/node, and linear in path length (or somewhat super-linear depending on what partitioning challenges one encounters).
Scott Meyer on May 7th, 2012 8:23 pm

Curt,

The point of defining a graph database in this way is exactly to specify properties of the underlying physical model. If there is a theoretical distinction between a graph data model and a relational model, I suspect that it is very subtle. However, the distinction between constant-time navigation (hash tables, basically) and logarithmic (btrees) is chalk and cheese. LogN navigation means that there is some scale at which it no longer makes economic sense to embed a small graph in a large one.

Similarly, the practical impact of requiring a constant-time schema change is that physical data must be laid out independent of schema. If not, then changing schema means changing N pieces of data.
Cray | DBMS 2 : DataBase Management System Services on July 2nd, 2012 4:55 am

[…] Graph-oriented systems such as terrorist networks are also now in the mix, which is how Yarcdata got started. […]
Introduction to Neo Technology and Neo4j | DBMS 2 : DataBase Management System Services on December 19th, 2013 4:18 pm

[…] we look at my basic list of graph data model application areas, Neo4j seems to be involved in most of what you would think. Exceptions […]
21st Century DBMS success and failure | DBMS 2 : DataBase Management System Services on July 14th, 2014 2:37 am

[…] NoSQL/non-SQL has penetrated large enterprises mainly for a few specific use cases, for example the lists I posted for MongoDB or graph databases. […]
DataStax Brings Graph Databases to Enterprise Cassandra – TheServerFinders on April 15th, 2016 5:17 am

[…] database research analyst Curt Monash explained on his firm’s website a few years back, a graph database describes the full relationship between […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Notes on graph data management

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin