November 28, 2011

Terminology: Data mustering

I find myself in need of a word or phrase that means bring data together from various sources so that it’s ready to be used, where the use can be analysis or operations. The first words I thought of were “aggregation” and “collection,” but they both have other meanings in IT. Even “data marshalling” has a specific meaning different from what I want. So instead, I’ll go with data mustering.

I mean for the term “data mustering” to encompass at least three scenarios:

Let me explain what I mean by each. 

“Integrated data warehouse” is a phrase Teradata has started using for enterprise data warehouses that, like approximately every other EDW in the entire history of data warehousing, aren’t truly enterprise-wide. In other words, it means “not just a data mart”. No category name is perfect, but I think that one works reasonably well.

I previously described the big bit bucket use case as

Users take a whole lot of data, often machine-generated data in logs of different kinds, and dump it into one place, managed by Hadoop, at open-source pricing.

and quickly added

Of course, there are various outfits who’d like to sell you not-so-cheap bit buckets. Contending technologies include Hadoop appliances (which I don’t believe in), Splunk (which in many use cases I do), and MarkLogic (ditto, but often the cases are different from Splunk’s). Cloudera and IBM, among other vendors, would also like to sell you some proprietary software to go with your standard Apache Hadoop code.

I think I’ll stand pat on that explanation. 🙂

By analogy, a big bit stream is various streams of data, assembled in the custody of a streaming engine. Sybase told me Wednesday that this scenario appears in both of the traditional markets for CEP/streaming — national intelligence, where it is a major use of streaming, and capital markets in some use cases as well. And it’s consistent with what I’ve heard from other CEP/streaming vendors as well.

As for where I got the word “mustering” — it’s a military term, for when you assemble your troops and their gear either for inspection or for actual use. The main modern usage I know of the word is as part of the phrase “pass muster”, which originally referred to the concept that the person being paid to put a regiment together should from time to time demonstrate that the regiment physically existed in the form that regimental records seemed to show.

Comments

12 Responses to “Terminology: Data mustering”

  1. Agile predictive analytics — the “easy” parts : DBMS 2 : DataBase Management System Services on November 28th, 2011 2:39 pm

    […] Data mustering for the analysts. […]

  2. Joe on November 28th, 2011 2:40 pm

    Interesting thoughts as usual, Curt.

    For what it’s worth, I’ve heard Cloudera refer to a similar concept (by my estimation) as “Data Stewardship” and the person who performs the role as a Data Steward.

    Have you come across that term in your talks with them, and is it the same thing you’re describing here?

  3. Curt Monash on November 28th, 2011 3:09 pm

    “Data stewardship” doesn’t ring much of a bell, but sounds too close to “data governance” for my tastes.

  4. Alex Popescu on November 28th, 2011 6:24 pm

    As far as I could tell, the closest terms I’ve heard and used to describe it are “data federation” or “federated database”.

    A://

  5. Dawit on November 28th, 2011 7:32 pm

    Curt, how does “data integration” and the decades old industry behind it fit in here ? “Data integration” is also a term that is widely used by the data management research community and the wikipedia entry of the same provides a good summary.

  6. Curt Monash on November 28th, 2011 8:17 pm

    Alex,

    Data federation usually refers to data being physically in different systems, but them being viewed as a logical whole. In contrast, what I’m refer to usually involves data all being in one place.

  7. Curt Monash on November 28th, 2011 8:19 pm

    Dawit,

    Thank you for pointing me at that hideously incorrect Wikipedia entry, which seems to use “data integration” as a synonym for “data federation”.

    Please see my previous comment for the contrast to data federation.

  8. Gary on November 29th, 2011 4:26 am

    The word mustering is still used (at least in Australia) for bringing together cattle before taking them off to become tasty steaks etc.

  9. Curt Monash on November 29th, 2011 8:15 am

    Hah! Chalk one up for the Australians! Even if the whole country does mispronounce my last name. 🙂

  10. Shawn on November 29th, 2011 5:24 pm

    I like the term data mustering, but data wrangling and data herding would work also.

  11. ClearStory, Spark, and Storm | DBMS 2 : DataBase Management System Services on September 29th, 2013 10:54 pm

    […] much of its technical differentiation in the area of data mustering […]

  12. Data integration as a business opportunity | DBMS 2 : DataBase Management System Services on July 20th, 2014 11:59 pm

    […] have a “collect all your data in one place” part to their stories — which I call data mustering — and Hadoop is a data transformation tool as […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.