Notes and comments — October 31, 2012
Time for another catch-all post. First and saddest — one of the earliest great commenters on this blog, and a beloved figure in the Boston-area database community, was Dan Weinreb, whom I had known since some Symbolics briefings in the early 1980s. He passed away recently, much much much too young. Looking back for a couple of examples — even if you’ve never heard of him before, I see that Dan ‘s 2009 comment on Tokutek is still interesting today, and so is a post on his own blog disagreeing with some of my choices in terminology.
Otherwise, in no particular order:
1. Chris Bird is learning MongoDB. As is common for Chris, his comments are both amusing and enlightening.
2. When I relayed Cloudera’s comments on Hadoop adoption, I left out a couple of categories. One Cloudera called “mobile”; when I probed, that was about HBase, with an example being messaging apps.
The other was “phone home” — i.e., the ingest of machine-generated data from a lot of different devices. This is something that’s obviously been coming for several years — but I’m increasingly getting the sense that it’s actually arrived.
3. Todd Papaioannou added a comment summarizing the Continuuity story.
4. Stay tuned for more on Cloudera Impala. (Edit: Now posted.)
5. I never seem to get around to blogging about Master Data Management (MDM), in part because Informatica never rescheduled a briefing with me that they canceled in July. But it’s an important concept to recognize, at least to the extent:
- If you try to combine data from different applications, often it is stored inconsistently.
- One approach to dealing with the problem is to have separate software that maintains the “true” value and representation of the data, which can then be accessed by applications that need it.
- However, everybody agrees that MDM is a business process, not just a software category.
- Specific problems in MDM include but are hardly limited to:
- Different (mis)spellings and so on of names (of people, businesses, and/or products).
- Identifying where individuals and divisions sit in organizational hierarchies.
Every time you build an application — NoSQL or otherwise — that stores data redundantly with other applications over other data stores, you’re making your MDM problem bigger.
6. Metamarkets’ Druid was open-sourced. Numerous other product introductions and so on that I’ve hinted at have happened as well.
7. In a comment on my Platfora post, Neil Hepburn made a good point about associative UIs and acyclic join paths.
8. I made a hash of my attempted glossary entry for DBMS, and need to rethink it.
9. IBM’s DataStage is based on Pick technology. That makes sense based on Ascential’s company history; even so, it was news to me.
Comments
2 Responses to “Notes and comments — October 31, 2012”
Leave a Reply
I am quite sad to hear about Dan Weinreb’s passing away. I had a chance to work with him during the early 90’s when both of us were at Object Design (with Tom Atwood and co.). He was one of the founders at ODI.
I learnt quite a few design concepts from Dan during our discussions. More recently (in 2009), I tried to get him to work on Informatica related technologies, but without luck.
I hope his family members are well, and may his soul rest in peace!
Regards,
Sanjeev Kumar.
[…] Growing attention to machine-generated data. Human-generated data grows at the rate business activity does, plus 0-25%. Machine-generated data grows at the rate of Moore’s Law, also plus 0-25%, which is a much higher total. In particular, the use of remote machine-generated data is becoming increasingly real. […]