It’s official — the grand central EDW will never happen
I pointed out last year that the grand central enterprise data warehouse couldn’t happen; the post started:
An enterprise data warehouse should:
- Manage data to high standards of accuracy, consistency, cleanliness, clarity, and security.
- Manage all the data in your organization.
Pick ONE.
IBM’s main theme at the Enzee Universe conference has been to say the same thing.
Merv Adrian’s talk at the same conference made it clear that Gartner feels the same way, as does he personally. Indeed, like me, he’s racked up multiple decades of industry experience without ever finding a single theoretically ideal grand central EDW.
Forrester Research has been a little less clear on the point, but generally seems to be on the correct side of the issue as well.
If somebody is still saying that one central enterprise data warehouse can hold all the information or data you need on which to base your business decisions, they’re probably not somebody you should be listening to very hard.
Is that clear, or should I hammer home the point even harder? 😀
Comments
8 Responses to “It’s official — the grand central EDW will never happen”
Leave a Reply
Curt, Are you saying that the Kimball Bus matrix approach is not based on historical evidence in the field? Maybe people don’t end up creating all the conformed dimensions and facts, but it’s a good tool for a roadmap – right?
An EDW doesn’t have to “manage all the data in your organization” to be an EDW… it just has to integrate data from multiple, disparate source systems to serve multiple, disparate user communities. Integrated data are always preferable to the alternatives, when and where that data needs to be shared and/or re-used. Where that isn’t the case, we should always remember that integrating data is not an end in itself.
Hari, Martin,
The idea that SOME of your data winds up in something that looks like a classical EDW is just peachy.
The idea that everything will is hopelessly unrealistic, and further can be damaging if you overdo the pretense that it ever will work out that way.
That’s my point.
To enable your organization as a whole make data provide business value, a carefully selected subset of “all data” will support both specific business processes as well as corporate and cross process analytical needs. Some data will of course only provide business value from a departemental or specific business process perspective and will not be needed in the EDW, though it mat be argued that it is more cost effective to host it in existing systems.
My point is that the trick is to select the RIGHT data to persist in the EDW.
In a similar fashion, one can argue that a business process only need a certain degree of data quality that will provide for sufficient business value. Getting 75% correct data may be the optimal business value.
[…] It’s official — the grand central EDW will never happen […]
“If somebody is still saying that one central enterprise data warehouse can hold all the information or data you need on which to base your business decisions, they’re probably not somebody you should be listening to very hard.”
The interesting words here, for me, are ‘can’ and ‘need’.
An EDW ‘can’ hold all of the data required to drive business decisions. Holding the data is not a challenge, or it shouldn’t be – pick an appropriate technology (and SI partner!).
Sourcing, integrating and delivering the data are the challenges. Politics, economics and lack of business vision are the blockers for me, not technology.
What the business ‘need’ to drive decision making is not always apparent, and is almost certainly an ever-moving target. The important data sets are no doubt blindlingly obvious and almost certainly well covered.
Having a few low-value data sets outside the EDW does not, in my opinion, undermine the claim that ‘all of the data’ is in the EDW. A pragmatic view of ‘all’ is all of the data we currently believe to add value to business decision making.
FYI, I’m writing this from a customer site that drives all BI from a single EDW containing all data required for business decision making, in a system fed real-time from nearly all of the source systems in the enterprise. This includes high volume web data with latency measured in minutes.
Yes, I did say, ‘nearly’ all of the source systems. There are a few minor ones out there that don’t contribute, but there are no impediments to adding them into the mix. Project gets raised, data gets sourced, integrated and is used to drive decision making. Simples?
Thanks, Paul!
A few questions:
How much data?
How many sources?
How many sources of machine-generated data?
How many sources of social media or other text data?
What’s used for predictive modeling?
Hi Curt, a few answers:
Q. How much data?
A. ~20TB covering > 5 years of history.
Q. How many sources?
A. ~10 internal and ~20 external.
Q. How many sources of machine-generated data?
A. ~2-3.
Q. How many sources of social media or other text data?
A. Social media data is not owned by the enterprise and is not sourced at present. All web activity data generated by the companies own transactional web sites is loaded in near-real time, including text data!
Q. What’s used for predictive modeling?
A. SAS.