Memory-centric data management when locality matters
Ron Pressler of Parallel Universe/SpaceBase pinged me about a data grid product he was open sourcing, called Galaxy. The idea is that a distributed RAM grid will allocate data, not randomly or via consistent hashing, but rather via a locality-sensitive approach. Notes include:
- The original technology was developed to track moving objects on behalf of the Israeli Air Force.
- The commercial product is focused on MMO (Massively MultiPlayer Online) games (or virtual worlds).
- The underpinnings are being open sourced.
- Ron suggests that, among other use cases, Galaxy might work well for graphs.
- Ron argues that one benefit is that when lots of things cluster together — e.g. characters in a game — there’s a natural way to split them elastically (shrink the radius for proximity).
- The design philosophy seems to be to adapt as many ideas as possible from the way CPUs manage (multiple levels of) RAM cache.
The whole thing is discussed in considerable detail in a blog post and a especially in a Hacker News comment thread. There’s also an error-riddled TechCrunch article.
In the areas I cover, “error-riddled TechCrunch article” is pretty much a redundant phrase — but that post looked particularly bad.
Meanwhile, I just noticed a May, 2009 blog post out of Progress Apama. The idea was that event streaming technology could be used to track moving objects, something I heard directly from the CEP (Complex Event Processing) vendors in the 2007 – 2009 period as well.
My tentative opinions on all this start:
- Locality is really important for graphs. Random partitioning is crazy if there’s a locality-friendly alternative.
- Ron plays different MMOs than I do. That said, the real market would more likely be new games than existing ones. And Guild Wars 2 (for example) is showing the way to gathering many characters together in a small game area.
- It’s easy to conceive of cases in which there’s so much specific information about moving objects’ locations that you have to throw much of it away, rather than persisting it all. That speaks for memory-centric technology in general, and data reduction in particular (in the CEP sense of “data reduction”, not the statistics meaning).
- Sensor and scientific data often have strong locality.
Related link
- I’ve written a fair amount recently about graph data management, although I haven’t tackled the partitioning issue head-on.
Comments
2 Responses to “Memory-centric data management when locality matters”
Leave a Reply
[…] SpaceBase Introduces Memory-Centric Data Management Tavis J. Hampton | August 7th READ MORE Tweet It is difficult enough to find data in a large database, but imagine if you also needed to locate the data and map its location in relation to other pieces of data. A distributed memory system needs to be able to place data on a specific node and also do so without putting an enormous amount of stress on the nodes in a cluster. Rather than randomly allocating data using consistent hashing , this type of distributed RAM grid would need to use locality-sensitive distribution. […]
[…] SpaceBase Introduces Memory-Centric Data Management Tavis J. Hampton | August 7th READ MORE Tweet It is difficult enough to find data in a large database, but imagine if you also needed to locate the data and map its location in relation to other pieces of data. A distributed memory system needs to be able to place data on a specific node and also do so without putting an enormous amount of stress on the nodes in a cluster. Rather than randomly allocating data using consistent hashing , this type of distributed RAM grid would need to use locality-sensitive distribution. […]