MySpace’s multi-hundred terabyte database running on Aster Data
Aster Data has put up a blog post embedding and summarizing a video about its MySpace account. Basic metrics include:
The combined Aster deployment now has 200+ commodity hardware servers working together to manage 200+ TB of data that is growing at 2-3TB per day by collecting 7-10B events that happen on one of the world.
I’m pretty sure that’s counting correctly (i.e., user data).*
*That said, simple multiplication makes me wonder why the database isn’t bigger yet. Aster’s MySpace relationship is over a year old, although the MySpace Music piece is newer. Perhaps some of that 2-3 TB per day is eventually being thrown away.
The blog post and video feature some ringing endorsements by MySpace, with key points including:
- Aster supports a data warehouse and a bunch of data marts.
- MySpace regards the Aster installation as mission-critical.
- A huge Aster data warehouse got up and running in weeks.
- Aster survived a node hardware failure seamlessly.
- Requirements planning for the data warehouse was included in the MySpace Music project design from the getgo. MySpace raves over this strategy as a “blueprint” for future data warehousing success.
If you want to watch the whole video, it’s mercifully short — 4:18 — although the annoying background music makes those 4:18 feel longer than they really are.
Related link
-
Fox Interactive Media’s multi-hundred terabyte database running on Greenplum (coming soon)
Comments
11 Responses to “MySpace’s multi-hundred terabyte database running on Aster Data”
Leave a Reply
[…] practices, claiming that it is in the process of supplanting Aster Data at Fox/MySpace. In fact, MySpace’s use of Aster is more mission-critical than Fox’s use of Greenplum, and is increasing […]
[…] http://www.dbms2.com/2009/03/05/myspaces-multi-hundred-terabyte-database-running-on-aster-data/ […]
I was excited to read this at first, so I attended the case study presentation from Asterdata at the Gartner conference on BI in DC where I work.
The head of the project at Myspace talked about the project with Aster, but she had a different, not very good story to tell. The odd thing was that she seemed not all that unhappy with the actual technology, but confessed that there were “no business users running queries” and when questioned, said “next time we should talk to the business users before we build a warehouse”. She thought that maybe connecting Microsoft to the warehouse would help, but Aster doesn’t have connections for it yet.
The Myspace project looks a lot like a tekkie group of IT people playing with cluster technology, not like an operational system at all. Someone at the conference was also saying that Aster can’t even do more than two tables in a join or subqueries. I want to know if there is anyone using Asterdata for real work.
Anonymous,
Why are you not free to use your name in posting this supposed public information?
CAM
In her talk at Garter, MySpace’s VP of Data, Hala Al-Adwan, described in detail how Aster delivers business value to both internal and external users, including finance, marketing, product management, and the record labels associated with MySpace Music. This talk was recorded and will be available through Gartner, so viewers can see what she actually said.
[…] Fox Interactive Media/MySpace has multi-hundred terabyte databases running on each of Greenplum and Aster Data nCluster. […]
Yes, I would like to see what she said, can’t find it on the site. Does anyone have a link to the talk?
[…] Music doesn’t seem to be producing great financial results — some advanced data warehousing software […]
[…] Fox Interactive Media, even ahead of much larger user Greenplum user eBay, and notwithstanding Aster Data’s large presence in Fox subsidiary MySpace. I just ran across a “review” of Greenplum by FIM’s Brian Dolan, neatly […]
[…] today that it’s providing .NET support for SQL/MapReduce. Perhaps not coincidentally, Aster’s biggest customer is MySpace, which is apparently a big Microsoft shop. (And MySpace parent Fox Interactive Media is a […]
[…] Aster has gained attention for the size and speed of rollouts such as the more-than-200TB data warehouse run by MySpace, Gartner says that its flagship nCluster database lacks some basic features in the area of stored […]