September 24, 2011
Confusion about Teradata’s big customers
Evidently further attempts to get information on this subject would be fruitless, but anyhow:
- Teradata emailed me a couple of months ago saying something like that at that point they could count 16 petabyte-level customers. In response to my repeated requests for clarification, Teradata has explicitly refused to identify the metric used in reaching that conclusion.
- At some point Teradata did something — as per a tweet of his — to convince Neil Raden that they have 20 petabyte-class users.
- That tweet was made around the time that Teradata apparently showed a slide naming big users at the Strata conference (last week).
- If Teradata is counting the way they did three years ago, that count of 16 or 20 or whatever is probably inflated compared to, say, Vertica’s figure of 7 a few months back.
- Even so, it’s obvious — and not just from the eBay example — that Teradata has one of the most scalable analytic DBMS offerings around.
Comments
9 Responses to “Confusion about Teradata’s big customers”
Leave a Reply
FYI From our public presentations:
Our largest system (Singularity) is currently at 37PB raw storage and at 5.5x compression for the vast majority of semi structured data. That is raw un-modeled data, mainly name/value pairs generated by upstream systems.
Largest single table is 2PB after compression, 2+ trillion records, holding 200+ trillion name/value pairs.
This table alone is accessed 20k+ times per day with an average response time of 18s.
e.g. Taking a full day worth of eBay data, extracting 100+ billion search impressions (all items shown on all search pages) out of semi structured name/value pairs, pivoting them, counting and sorting them descending to find the highest impression counts runs for about 30s
e.g. Sorting a raw TB takes about 9s
Hi, Oliver!
Cool stats!
37 PB would have to be sliced up into primary copy of data, mirror, and temp space. Anything else major, or is roughly 1/3 of it holding the actual database?
Just one note. You do not have to sort 100+ billion items to find the list of top x ones. Otherwise, everything (if it is true) is very impressive.
Very cool stats indeed.
As a general guide, the 1/3 figure for the amount of spinning disk available for data storage is about right.
The rest goes in mirroring, overhead and spool (work) space.
@Vlad: Correct and I did not state that. I meant to say counting (basically group by) and then sorting the result. Its still way north of hundred million for the result set. The stat about sorting a TB is a totally separate stat.
@Curt: 1/2 is for the mirror and a few % for the file system. All the rest the MPP RDBMS gets to see as one. The traditional 30% for temp or spool does not apply for these extreme large BigData systems. Most of the time it runs well below 1% temp space or spool, only occasional spikes to 5% – which is almost a PB times 5.5x compression is more like 5PB of raw data
Oliver,
That sounds like 37 PB of raw storage should be multiplied by about 5/2 to get the total amount of data under management. Yikes! No wonder you’re proud of how big it is. 🙂
Curt,
I merely said Teradata announced that had 20 petabyte level installations. I made no claim to validate it, only to mention it. I did not say they convinced me.
Ahh.
Since you seemingly stated it as fact, without any kind of qualification I could detect, I hope you’ll forgive me my error of interpretation. 🙂
That kind of thing happens when there are 140 character limits …
[…] has an outstanding track record both for managing large data volumes and for high-concurrency mixed […]