July 1, 2008
The IRS data warehouse
According to a recent Eric Lai Computerworld story and a 2006 Sybase.com success story,
- The IRS has a data warehouse running on Sybase IQ, with 500 named users, called the CDW (Compliance Data Warehouse). (Computerworld)
- By some metric, it’s a 150 TB warehouse. (Computerworld)
- By some metric, they add 15-20 TB/year, with a 4 hour load time. (Computerworld)
- As of 2006, there were 20-25 TB of “input data”, with a “70% compression rate”. (Sybase)
I can’t entirely reconcile those numbers, but in any case the database sounds plenty big.
Computerworld also said:
the research division also uses Microsoft Corp.’s SQL Server to store all of the metadata for the data warehouse and the rest of the agency. Managing and cleaning all of that metadata — 10,000 labels for 150 databases — is a huge task in itself,
Categories: Analytic technologies, Data warehousing, Specific users, Sybase
Subscribe to our complete feed!
Comments
2 Responses to “The IRS data warehouse”
Leave a Reply
Sybase IQ is a column-oriented database. This is why it can achieve such tremendous benefits in load times, query times, and compression ratios.
SQLServer (and Oracle and DB2) are all row-oriented, as are most other mainstream RDBMSs.
Since the focus of DBMSs is moving from transaction processing to analytics, we will likely see a shift towards column-oriented databases – I would argue th at the row-oriented database is all but obsolete.
Neil,
I see from your blog that you first learned about columnar database management systems in February. You’ve come to the right place to learn more about them!
http://www.dbms2.com/category/database-theory-practice/columnar-database-management/