eBay
Discussion of eBay’s use of database and analytic technology. Related subjects include:
- The use of analytic technologies to study web and network event data
eBay doesn’t love MapReduce
The first time I ever heard from Oliver Ratzesberger of eBay, the subject line of his email mentioned MapReduce. That was early this year. Subsequently, however, eBay seems to have become a MapReduce non-fan. The reason is simple: eBay’s parallel efficiency tests show that MapReduce leaves most processors idle most of the time. The specific figure they mentioned was parallel efficiency of 18%.
Categories: eBay, MapReduce, Parallelization | 7 Comments |
Teradata’s Petabyte Power Players
As previously hinted, Teradata has now announced 4 of the 5 members of its “Petabyte Power Players” club. These are enterprises with 1+ petabyte of data on Teradata equipment. As is commonly the case when Teradata discusses such figures, there’s some confusion as to how they’re actually counting. But as best I can tell, Teradata is counting: Read more
Categories: Data warehousing, eBay, Market share and customer counts, Petabyte-scale data management, Specific users, Teradata | 11 Comments |
The eBay analytics guys have a blog now
Oliver Ratzesberger and his crew have started a blog, focusing on xldb analytics. Naturally, one of the early posts gives a quick overview of their system stats. Highlights include:
Incoming data volumes exceed 40TB per day, with more than 10^11 new items/lines/records being added per day. Our analytical processing infrastructure exceeds 6PB of physical storage with over 2.9PB(1.4+1.5) in our largest cluster.
We leverage compression technologies wherever possible and are achieving compression ratios as high as 99% on our highest volume data feeds.
On any given day our massive parallel systems process more than 27PB of data, not factoring in various levels of caches that serve similar activities or processes and reduce the amount of physical IOs significantly.
We execute millions of requests on a daily basis, spanning from near realtime highly localized access to enormous jobs that span 100s of TB in a single or series of models.
Categories: eBay, Specific users | Leave a Comment |
eBay OLTP architecture
I’ve posted a couple times about eBay’s analytics side. As a companion, Don Burleson pointed me at a fascinating November, 2006 slide presentation outlining eBay’s transactional architecture and evolution. Highlights include:
- A whole lot of manual slicing of Oracle databases, so as not to exceed their capacity.
- A whole lot of careful design and ordering of transactions.
- Putting all the business logic in the application tier, with a custom O/R mapper. There’s lots of caching there, but very little state.
The presentation has a bunch of specific numbers, in case anybody wants to dive in.
Categories: eBay, OLTP, Specific users | Leave a Comment |
The biggest eBay database
There’s been some confusion over my post about eBay’s multiple petabytes of data. So to clarify, let me say:
- eBay’s figure of >1.4 petabytes of data — for its largest single analytic database — counts disks or something, not raw user data.
- I previously published a strong conjecture that the database vendor in question was Teradata, which is definitely an eBay supplier. In particular, it is definitely not an Oracle data warehouse.
- While eBay isn’t saying who it is either — not even off-the-record — the 50%ish compression figures they experience just happen to map well to Teradata’s usual range.
- Edit: Just to be clear — not that there was any doubt, but I have reconfirmed that eBay is a Teradata user, in or including eBay’s Paypal division.
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, eBay, Specific users, Teradata | 1 Comment |
eBay is over 5 petabytes now
Single largest database >1.4 petabytes.
From Oliver Ratzesberger’s LinkedIn profile:
Our systems process in excess of 10 billion records per day, serving thousands of users and delivering hundreds of millions of queries per month in a true global 24×7 operation with distributed teams around the globe on systems over 5 PB in size (largest single system >1.4PB).
Categories: eBay, Specific users | 3 Comments |
Marketing versus reality on the one-petabyte barrier
Usually, I don’t engage in the kind of high-speed quick-response blogging I have over the past couple of days from the Teradata Partners conference (and more generally have for the past week or so). And I’m not sure it’s working out so well.
For example, the claim that Teradata has surpassd the one-petabyte mark comes as quite a surprise to variety of Teradata folks, not to mention at least one reliable outside anonymous correspondent. That claim may indeed be true about raw disk space on systems sold. But the real current upper limit, according to CTO Todd Walter,* is 5-700 terabytes of user data. He thinks half a dozen or so customers are in that range. I’d guess quite strongly that three of those are Wal-Mart, eBay, and an unspecified US intelligence agency.
*Teradata seems to have quite a few CTOs. But I’ve seen things much sillier than that in the titles department, and accordingly shan’t scoff further — at least on that particular subject. 😉
On the other hand, if anybody did want to buy a 10 petabyte system, Teradata could ship them one. And by the way, the Teradata people insist Sybase’s claims in the petabyte area are quite bogus. Teradata claims to have had bigger internal systems tested earlier than the one Sybase writes about.
Categories: Data warehouse appliances, Data warehousing, eBay, Petabyte-scale data management, Specific users, Sybase, Teradata | 3 Comments |
Teradata apparently has crossed the petabyte barrier
According to a hurried conversation I had with Chief Marketing Office Darryl MacDonald, Teradata has customers with over 1 petabyte of user data in a single instance. He wouldn’t disclose any names, but I’d guess one is eBay, who he did confim is a customer. The intelligence area is another one where I’d speculate there are Very Large Databases.
However, since Darryl mentioned testing systems internally up to 4 petabytes, I’d guess the upper limit of Teradata deployments is in the 1-2 petabyte range.
EDIT: I’m now guessing that Teradata’s largest classified database — which previously was the largest overall — isn’t much over a petabyte in size. And there’s a strong chance this is larger than any unclassified one.
Update: That wasn’t really 1+ petabyte of user data.
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, eBay, Specific users, Teradata | Leave a Comment |
eBay’s version of DBMS2
Every sufficiently large or agile enterprise needs to follow the DBMS2 approach. The following is from an article on eBay’s version:
“eBay has built a software-based Integration Tier. This contains both a data access layer (DAL) and a services framework. The Integration Tier acts as an abstraction layer for software engineers to work with many disparate back-end data sources through a consistent set of abstractions.”
Categories: EAI, EII, ETL, ELT, ETLT, eBay, Specific users | Leave a Comment |