September 24, 2011

Confusion about Teradata’s big customers

Evidently further attempts to get information on this subject would be fruitless, but anyhow:

Teradata emailed me a couple of months ago saying something like that at that point they could count 16 petabyte-level customers. In response to my repeated requests for clarification, Teradata has explicitly refused to identify the metric used in reaching that conclusion.
At some point Teradata did something — as per a tweet of his — to convince Neil Raden that they have 20 petabyte-class users.
That tweet was made around the time that Teradata apparently showed a slide naming big users at the Strata conference (last week).
If Teradata is counting the way they did three years ago, that count of 16 or 20 or whatever is probably inflated compared to, say, Vertica’s figure of 7 a few months back.
Even so, it’s obvious — and not just from the eBay example — that Teradata has one of the most scalable analytic DBMS offerings around.

Categories: Petabyte-scale data management, Teradata

Subscribe to our complete feed!

Comments

9 Responses to “Confusion about Teradata’s big customers”

Oliver Ratzesberger on September 25th, 2011 8:06 am

FYI From our public presentations:
Our largest system (Singularity) is currently at 37PB raw storage and at 5.5x compression for the vast majority of semi structured data. That is raw un-modeled data, mainly name/value pairs generated by upstream systems.
Largest single table is 2PB after compression, 2+ trillion records, holding 200+ trillion name/value pairs.
This table alone is accessed 20k+ times per day with an average response time of 18s.
e.g. Taking a full day worth of eBay data, extracting 100+ billion search impressions (all items shown on all search pages) out of semi structured name/value pairs, pivoting them, counting and sorting them descending to find the highest impression counts runs for about 30s
e.g. Sorting a raw TB takes about 9s
Curt Monash on September 25th, 2011 9:51 am

Hi, Oliver!

Cool stats!

37 PB would have to be sliced up into primary copy of data, mirror, and temp space. Anything else major, or is roughly 1/3 of it holding the actual database?
Vlad Rodionov on September 25th, 2011 5:05 pm

Just one note. You do not have to sort 100+ billion items to find the list of top x ones. Otherwise, everything (if it is true) is very impressive.
Paul Johnson on September 26th, 2011 2:19 pm

Very cool stats indeed.

As a general guide, the 1/3 figure for the amount of spinning disk available for data storage is about right.

The rest goes in mirroring, overhead and spool (work) space.
Oliver Ratzesberger on September 27th, 2011 9:23 pm

@Vlad: Correct and I did not state that. I meant to say counting (basically group by) and then sorting the result. Its still way north of hundred million for the result set. The stat about sorting a TB is a totally separate stat.

@Curt: 1/2 is for the mirror and a few % for the file system. All the rest the MPP RDBMS gets to see as one. The traditional 30% for temp or spool does not apply for these extreme large BigData systems. Most of the time it runs well below 1% temp space or spool, only occasional spikes to 5% – which is almost a PB times 5.5x compression is more like 5PB of raw data
Curt Monash on September 27th, 2011 9:43 pm

Oliver,

That sounds like 37 PB of raw storage should be multiplied by about 5/2 to get the total amount of data under management. Yikes! No wonder you’re proud of how big it is. 🙂
Neil Raden on September 30th, 2011 9:52 pm

Curt,

I merely said Teradata announced that had 20 petabyte level installations. I made no claim to validate it, only to mention it. I did not say they convinced me.
Curt Monash on October 1st, 2011 12:48 am

Ahh.

Since you seemingly stated it as fact, without any kind of qualification I could detect, I hope you’ll forgive me my error of interpretation. 🙂

That kind of thing happens when there are 140 character limits …
Comments on the analytic DBMS industry and Gartner’s Magic Quadrant for same : DBMS 2 : DataBase Management System Services on February 9th, 2012 4:21 am

[…] has an outstanding track record both for managing large data volumes and for high-concurrency mixed […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Confusion about Teradata’s big customers

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin