Really big databases
Business Intelligence Lowdown has a well-dugg post listing what it claims are the 10 largest databases in the world. The accuracy leaves much to be desired, as is illustrated by the fact that #10 on the list is only 20 terabytes, while entirely unmentioned is eBay’s 2-petabyte database (mentioned here, and also here). Only two phone companies were listed, and no credit-raters. And for some databases listed, the size given seemed too low. E.g., for Google I’d guess that the average page size it indexes is 10K+ (vs. the 100K or so maximum), even with all the junk stuff in there, so it’s in the 100s of terabytes at a minimum, for raw data alone before considering indexes and so on.
Oracle, IBM, Teradata, Netezza, and DATAllegro have plenty of customers each above the 20 terabyte size; apparently even Greenplum has one. For that matter, SAS software directly manages some multi-hundred-terabyte databases.
But flawed as it may be, the list highlights one important point — there’s a whole lot of data beyond standard rows and columns. Lots of text documents, videos, satellite telemetry and so on are on disk too. What’s more, GPS and RFID data probably aren’t far behind. Big databases are getting more diverse than ever.
Comments
4 Responses to “Really big databases”
Leave a Reply
Great post…I submit:
http://kevinclosson.wordpress.com/2006/11/28/introducing-the-unstructured-data-administrator/
Introducing the “Unstructured Data Administrator”….
Database Administrators or “Unstructured Data Administrators”. Who is Minding The Store?
Imagine trying to tell your kids that there was a day when hard drives where not used for storing unstructured data. My, oh my, how things change. The …
I have worked with some large systems, but when I think about databases the size of Google’s or some finacial organzation’s, I can’t even get my head around it.
I have to work hard when one of my developers tells me we have a Very Large Data Base because we have one table with 40 thousand rows. :).
Dead link dude, and I know errors are to be expected