Notes on graph data management
This post is part of a series on managing and analyzing graph data. Posts to date include:
- Graph data model basics (this post)
- Relationship analytics definition
- Relationship analytics applications
- Analysis of large graphs
Interest in graph data models keeps increasing. But it’s tough to discuss them with any generality, because “graph data model” encompasses so many different things. Indeed, just as all data structures can be mapped to relational ones, it is also the case that all data structures can be mapped to graphs.
Formally, a graph is a collection of (node, edge, node) triples. In the simplest case, the edge has no properties other than existence or maybe direction, and the triple can be reduced to a (node, node) pair, unordered or ordered as the case may be. It is common, however, for edges to encapsulate additional properties, the canonical examples of which are:
- Weight. Usually, the intuition here is that the weight is a number indicating the strength of the connection. This is generally derived from more basic data.
- Kind. The edge can encapsulate one or more descriptors indicating the kind of relationship between the nodes.
Many of the graph examples I can think of fit into four groups:
- Networks of people, aka social networks. Three (overlapping) areas of particular importance are:
- People/communications.
- One canonical example is influencer-finding in telecommunications customer bases. The nodes are subscribers; the edges are call details (raw or aggregated).
- Other examples may be found as subgraphs of our next category, namely …
- … people/places/things.
- This is the classic structure for anti-terrorism, law enforcement, or anti-fraud use cases.
- Nodes are people, buildings/addresses, cars, businesses, etc. , except that …
- … nodes can actually be ordered pairs (tangible thing, timestamp). After all, it’s more interesting if two people were, not just in the same place, but in the same place at the same time.
- People/connections/recommendations.
- Similarly, there are use cases in which various people have social network connections, and then also recommend products of some kind.
- Edges can carry information about the evident strength of the social network connection …
- … but also about apparent similarities in taste.
- People/communications.
- Graphs of IT objects. Various sets of conceptual IT objects can be viewed as graphs. For example:
- I visited Workday recently. They refer to their Java object model as a “graph.”
- Neo Technology (the Neo4j guys) started out doing a content management system, and eventually decided that what they really wanted underneath it was a graph-oriented DBMS.
- Now one of Neo’s major application areas is MDM (Master Data Management). Edit: I’m not wholly persuaded by the way they use that term.
- Most dramatically, there’s Tim Berners-Lee’s “Semantic Web”, which is built on RDF, which models things as “a directed, labeled graph”. SPARQL, OWL and so on are in the mix as well. To date, the Semantic Web has been a lot of hot air, only without the hot aspect; still, it’s obviously influenced many people’s thinking about graphs.
- Edit: Please see Marie’s comment below for a rather major example I left out. 🙂
- Taxonomies, ontologies, and/or semantic networks.
- To a large extent this overlaps with my previous category …
- … but I’m particularly fond of the example of straightforward taxonomies of words, e.g. WordNet. The nodes are the words themselves, or more precisely word senses (i.e., specific meanings of a word); edges are typically chosen from a limited set of alternatives such as is_a, is_part_of, or entails.
- Finally, there are representations of physical graphs. Examples might include telecom networks, utility grids, or locations and routes for physical deliveries.
My main reason for reciting these diverse examples is to illustrate that, for any really interesting technical discussion, it is necessary to focus on a subset of the possible use cases.
Comments
10 Responses to “Notes on graph data management”
Leave a Reply
http://www-01.ibm.com/software/data/db2/linux-unix-windows/graph-store.html
Nice post. Looking forward to the follow-ups.
Incidentally, I’m surprised you forgot the classic WWW graph in your examples 🙂
I don’t think you need to focus on a particular subset. In a graph database:
1. All relations are first class (represented equally).
2. Navigation can be done in constant time
3. Schema change can be done in constant time
The examples that you cite all benefit from these qualities in more-or-less obvious ways.
Marie,
Thanks! Post edited in response.
Scott,
Interesting points, but I’m not clear on what you mean by those “constant time” remarks. For example, I would expect simple forms of navigation to be logarithmic or so — I guess that might depend on the physical data model — in the number of edges/node, and linear in path length (or somewhat super-linear depending on what partitioning challenges one encounters).
Curt,
The point of defining a graph database in this way is exactly to specify properties of the underlying physical model. If there is a theoretical distinction between a graph data model and a relational model, I suspect that it is very subtle. However, the distinction between constant-time navigation (hash tables, basically) and logarithmic (btrees) is chalk and cheese. LogN navigation means that there is some scale at which it no longer makes economic sense to embed a small graph in a large one.
Similarly, the practical impact of requiring a constant-time schema change is that physical data must be laid out independent of schema. If not, then changing schema means changing N pieces of data.
[…] Graph-oriented systems such as terrorist networks are also now in the mix, which is how Yarcdata got started. […]
[…] we look at my basic list of graph data model application areas, Neo4j seems to be involved in most of what you would think. Exceptions […]
[…] NoSQL/non-SQL has penetrated large enterprises mainly for a few specific use cases, for example the lists I posted for MongoDB or graph databases. […]
[…] database research analyst Curt Monash explained on his firm’s website a few years back, a graph database describes the full relationship between […]