July 8, 2008

Google has thousands of internal data formats, mostly simple ones

In connection with the release of Protocol Buffers, Kenton Varda of Google wrote:

At Google, our mission is organizing all of the world’s information. We use literally thousands of different data formats to represent networked messages between servers, index records in repositories, geospatial datasets, and more. Most of these formats are structured, not flat. This raises an important question: How do we encode it all?

That sounds like a lot. On the other hand, if “data format” is just a synonym for “table structure,” “file structure,” and/or “schema,” it sounds more plausible. Varda goes on to say

a simple lists-and-records model … solves the majority of problems

Come to think of it, that sounds very consistent with the idea that MapReduce solves a large fraction of Google’s data management issues.

Subscribe to our complete feed!

Comments

2 Responses to “Google has thousands of internal data formats, mostly simple ones”

Daniel Weinreb on July 9th, 2008 7:19 am

The printed representation looks an awful lot like JSON (http://en.wikipedia.org/wiki/JSON). I wonder why not just use JSON, which is well-known and precisely specified? Anyway, this and JSON are very useful for many applications.

I agree that it’s another IDL. It’s not all THAT simple. But I haven’t used IDL’s too much in practice and probably it’s simpler than CORBA’s IDL! So, it looks nice; no major breakthrough or anything like that, just an incremental improvement on what we all know about. That’s fine; incremental improvements are perfectly respectable.
Curt Monash on July 11th, 2008 4:37 am

Dan,

They addressed the JSON point directly, albeit briefly, in the comment thread.

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Google has thousands of internal data formats, mostly simple ones

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin