July 31, 2010
Nested data structures keep coming up, especially for log files
Nested data structures have come up several times now, almost always in the context of log files.
- Google has published about a project called Dremel. Per Tasso Agyros, one of Dremel’s key concepts is nested data structures.
- Those arrays that the XLDB/SciDB folks keep talking about are meant to be nested data structures. Scientific data is of course log-oriented. eBay was very interested in that project too.
- Facebook’s log files have a big nested data structure flavor.
I don’t have a grasp yet on what exactly is happening here, but it’s something.
Categories: eBay, Facebook, Google, Log analysis, Scientific research, Theory and architecture
Subscribe to our complete feed!
Comments
7 Responses to “Nested data structures keep coming up, especially for log files”
Leave a Reply
Nested data requires a new BI engine?!?
This sounds like a data modeling challenge rather than a query engine challenge.
I’ve worked a with graph data models, and apart from cyclic graphs, pretty much any graph can be flattened into a form that can be queried using basic SQL syntax. Cyclic graphs can be rationalized too, but with trade-offs.
Common Table Expressions in SQL have made it possible to perform these transformations in a single declarative query.
E.g. for hierarchical data models, creating a reflexive transitive closure can do wonders.
Graphs can be tamed in many different ways. But I find most developers don’t see data modeling as part of their repetoire, and tend to look for algorithmic solutions.
This is part of the reason why I feel Computer Science education doesn’t pay enough attention to data modeling.
As they say, when you’ve got a hammer…
Three issues that aren’t the same:
1. Can you represent something logically in SQL at all?
2. Can you represent it logically and fairly concisely?
3. Can you get good performance in a fairly conventional SQL DBMS?
COBOL is back? Should I unpack my lava lamp and pet rock?
Hope nobody asks you to access the nested structure in an unanticipated way. I heard E.F Codd is working on something to solve that problem.
Indeed.
The NoSQL guys would do well to learn about IMS, IDS, Total, System 2000, etc. and why Codd proposed the relational model.
History repeats itself.
Hey Curt,
For what it’s worth, Hive was designed to work with nested data structures, though support for some obvious operations like EXPLODE (https://issues.apache.org/jira/browse/HIVE-510) are not yet implemented.
Later,
Jeff
[…] Hammerbacher has made various comments to the effect “Yes indeedy! Hadoop does that too!” (My wording, not his. […]
[…] I’ve noted before, the very big web companies have an issue with nested data structures. The subject came up in XLDB talks yesterday too, so my big goal for lunch was to finally […]