February 23, 2014

Confusion about metadata

A couple of points that arise frequently in conversation, but that I don’t seem to have made clearly online.

“Metadata” is generally defined as “data about data”. That’s basically correct, but it’s easy to forget how many different kinds of metadata there are. My list of metadata kinds starts with:

Data about data structure. This is the classical sense of the term. But please note:
- In a relational database, structural metadata is rather separate from the data itself.
- In a document database, each document might carry structure information with it.
Other inputs to core data management functions. Two major examples are:
- Column statistics that inform RDBMS optimizers.
- Value ranges that inform partition pruning or, more generally, data skipping.
Inputs to ancillary data management functions — for example, security privileges.
Support for human decisions about data — for example, information about authorship or lineage.

What’s worse, the past year’s most famous example of “metadata”, telephone call metadata, is misnamed. This so-called metadata, much loved by the NSA (National Security Agency), is just data, e.g. in the format of a CDR (Call Detail Record). Calling it metadata implies that it describes other data — the actual contents of the phone calls — that the NSA strenuously asserts don’t actually exist.

And finally, the first bullet point above has a counter-intuitive consequence — all common terminology notwithstanding, relational data is less structured than document data. Reasons include:

Relational databases usually just hold strings — or maybe numbers — with structural information being held elsewhere.
Some document databases store structural metadata right with the document data itself.
Some document databases store data in the form of (name, value) pairs. In some cases additional structure is imposed by naming conventions.
Actual text documents carry the structure imposed by grammar and syntax.

Related links

A lengthy survey of metadata kinds, biased to Hadoop (August, 2012)
Metadata as derived data (May, 2011)
Dataset management (May, 2013)
Structured/unstructured … multi-structured/poly-structured (May, 2011)

Categories: Data models and architecture, Hadoop, Structured documents, Surveillance and privacy, Telecommunications

Subscribe to our complete feed!

Comments

5 Responses to “Confusion about metadata”

Yves de Montcheuil on February 23rd, 2014 12:48 pm

Thanks Curt for this simple and straightforward explanation. Glad to see I wasn’t the only one cringing when hearing about the NSA’s “metadata”. Even Jack Bauer’s CTU, with their often stretched tech concepts, weren’t making this kind of mistakes…
Vincent McBurney on February 24th, 2014 9:49 pm

I would label a Call Detail Record event data and not metadata. I think they call it metadata to try and downplay what they are doing. In Information Management circles you throw around the term metadata if you want everyone in the room to go to sleep. If they referred to it as Personal Private Information instead of metadata there would have been a few more headlines.
What metadata means in modern PDM/PLM systems on February 25th, 2014 8:26 pm

[…] article Confusion about Metadata speaks about some additional aspects of metadata management that getting more relevant these days. […]
What metadata means in modern PDM/PLM systems | Daily PLM Think Tank Blog on February 25th, 2014 8:28 pm

[…] article Confusion about Metadata speaks about some additional aspects of metadata management that getting more relevant these days. […]
Misconceptions about privacy and surveillance | DBMS 2 : DataBase Management System Services on September 15th, 2014 1:08 pm

[…] February, 2014 post on various metadata-related confusions notes some egregious governmental spin. Categories: Health care, Predictive modeling and […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Confusion about metadata

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin