Objectivity Infinite Graph
I chatted Wednesday night with Darren Wood, the Australia-based lead developer of Objectivity’s Infinite Graph database product. Background includes:
- Objectivity is a profitable, decades-old object-oriented DBMS vendor with about 50 employees.
- Like some other object-oriented DBMS of its generation, Objectivity is as much a toolkit for building DBMS as it is a real finished DBMS product. Objectivity sales are typically for custom deals, where Objectivity helps with the programming.
- The way Objectivity works is basically:
- You manage objects in memory, in the format of your choice.
- Objectivity bangs them to disk, across a network.
- Objectivity manages the (distributed) pointers to the objects.
- You can, if you choose, hard code exactly which objects are banged to which node.
- Objectivity’s DML for reading data is very different from Objectivity’s DML for writing data. (I think the latter is more like the program code itself, while the former is more like regular DML.)
- The point of Objectivity is not so much to have fast I/O. Rather, it is to minimize the CPU cost of getting the data that comes across the wire into useful form.
- Darren got the idea of putting a generic graph DBMS front-end on Objectivity while doing a relationship analytics project for an Australian intelligence agency.
- Darren redoubled his efforts to sell the project internally at Objectivity after reading what I wrote about relationship analytics back in 2006 or so.
- There is now a 5 or so person team developing Infinite Graph.
- Infinite Graph is just now going out to beta test.
Infinite Graph is an API or language binding on top of Objectivity that:
- Hides a lot of Objectivity’s complexity.
- Is suitable for graph/relationship analytics.
The main point of the Infinite Graph beta test is to see whether Objectivity got the API right. By way of contrast, Objectivity is still just researching the DBMS optimization side of things. According to Darren, what makes that so hard is that if you partition the graph in some smart way, probably through some kind of costly algorithm to determine “least connectedness,” a bit more additional data can thoroughly invalidate your results. Thus, Darren is focused more on ensuring that performance is good even if data is distributed around the network in annoying ways.
One performance win that Infinite Graph seems to get (almost?) for free from being built on top of Objectivity is lots of prefetching. Specifically, graph nodes and their edges are stored together, just like objects and their pointers are in traditional Objectivity — and if a node is retrieved, the nodes it’s connected to might also get retrieved as a background operation, before they’re even needed. More generally, Objectivity has always tried to be fast about traversing pointers, and that is a whole lot like traversing graph edges.
As a future, Infinite Graph is looking at ideas from Google’s Pregel. As Darren characterizes it, in Pregel you wrap up information about a graph node and ship it off to another computing node if the next graph node you need is over there. Darren suspects that the extreme form of this strategy would not be ideal. (I gather from Darren that Google has realized the same thing from the getgo.) Instead, he’s pinning his hopes more on smarts about when to do that (costly) shipping, and when to just fetch the information back to the compute node currently being used.
The most interesting part of our discussion, in my opinion, was about applications and application functionality. In a nutshell, Darren seems to think that it’s all about the edges, rather than the nodes themselves. (My words, not his.) In particular:
- Edges are first-class citizens in Infinite Graph, just as nodes are.
- Graphs typically are polluted with lots of insignificant edges. Examples include:
- If you’re tracking people’s telephone traffic, lots of folks call the local pizza parlor. Indeed, it’s common to look for “star” nodes like that that have very high connectivity, and excise from the graph to reduce noise.
- Many measures of relationship include minor relationships. Facebook friends? LinkedIn connections? Occasional phone calls? Next door neighbors? All of those can indicate very minor relationships.
- Therefore, in Infinite Graph, edges (can) have weights. Darren says this is a widely-used capability in graph applications. The core reason is to let you distinguish between significant and insignificant edges. Note that these weights can be calculated based on the raw data and stored back into the database.
- In Infinite Graph, edges can also have effectiveness date intervals. E.g., if you live at an address for a certain period, that’s when the edge connecting you to it is valid.
- In general in Infinite Graph, edges can carry arbitrary or at least flexible “qualifier”/attribute information.
- For many applications, the number of possible nodes is fundamentally limited. There are only so many people in the world, so many street addresses, so many telephone numbers, and so on. (There was a time this wasn’t believed to be the case, because timestamping was done at the node rather than edge level. But I find persuasive Darren’s argument that it works better on edges.) Edit: Even so, DARPA is thinking in the billions-of-nodes range.
- Darren is in general agreement with my observation that the “social graph” shouldn’t primarily be regarded as a graph.
- Yes, the paradigmatic examples of intelligence agency graph analytics are telephone or even IP traffic analysis. Nodes can wind up with lots of edges connecting them. Full analysis of the graphs exceeds even the computing capacity available to governments.
- On a happy civil liberties note, Darren observed that Australian intelligence has a lot of red tape restricting them from getting this kind information. Basically, they can only get chunks of information “on demand”. An awkward side effect of this is that when they do get it, it could be in any number of formats.
Comments
10 Responses to “Objectivity Infinite Graph”
Leave a Reply
Objectivity’s basic system is an “object-oriented database” in the sense that was meant by that phrase in 1980. I was involved in a company that competed with them but did a database system that was very similar to theirs. So most of what I say in the following blog post applies to Objectivity as well:
http://danweinreb.org/blog/object-oriented-database-management-systems-succeeded
I think this kind of DBMS is great, and I truly think they’re on their way back. Not quite in the same form as in the 1980’s, of course: computer technology marches on. But the basic ideas are still great. I wish Objectivity the best of luck.
@Dan,
I was hoping you’d comment on this post! 🙂
Thanks,
CAM
[…] Curt Monash interviews Darren Wood on InfiniteGraph Curt Monash has written a great, in-depth review of InfiniteGraph. He spoke with InfiniteGraph’s lead architect, Darren Wood, last week and his write-up is available here. […]
Hey Curt,
I’ve done some investigation with Infinite Graph as well, and I believe that being based on their well-tested, distributed OODBMS gives them quite an edge over the current graph-based database systems.
First, given that several OODBMSes are already quite optimized at performing fast lookups and traversals on large, complex object relationships, tailoring a distributed OODBMS engine for graph-specific functionality is a great base on which to start. Second, while several of the graph-oriented systems out there are focused on trying to solve the, “optimal cuts on which to partition/distribute the graph” problem, Infinite Graph is just moving ahead with a standard object-oriented approach of keeping edges/relationships right with the vertices/nodes/objects themselves. Sure, they still have to optimize object placement, but at least they’ll be off-and-running with a distributed platform and good locality for edges (at the cost of additional space requirements).
Having done quite a bit of internals research on all of the current open-source graph systems, I believe Infinite Graph will already be more scalable. Neo4j and most others were not designed to be distributed, and some don’t even make good use of locality on a single node. The only concern I have is from a performance perspective–Infinite Graph is, as are most of its competitors, Java-based.
Should be interesting to see how the graph database market plays out.
[…] Graph […]
[…] that’s before we even mention XML documents, graph, […]
[…] and the technology isn’t obviously needed in single-server cases. (But see my post on Objectivity’s Infinite Graph.) Even so, graph-oriented apps are exploding, and MarkLogic should think about playing in the graph […]
[…] Objectivity was founded in 1988 and launched Objectivity/DB, an object database (a precursor to graph databases). It found early success as a non-relational database solving what we’d now call “big data” problems for a number of clients, particularly in science and government. It launched InfiniteGraph in 2010. You can find out more about its architecture in Curt Monash’s write-up here. […]
Ridiculous quest there. What happened after?
Take care!
[…] Curt Monash has written a great, in-depth review of InfiniteGraph. He spoke with InfiniteGraph’s lead architect, Darren Wood, last week and his write-up is available here. […]