Data warehousing
Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:
It’s official — the grand central EDW will never happen
I pointed out last year that the grand central enterprise data warehouse couldn’t happen; the post started:
An enterprise data warehouse should:
- Manage data to high standards of accuracy, consistency, cleanliness, clarity, and security.
- Manage all the data in your organization.
Pick ONE.
IBM’s main theme at the Enzee Universe conference has been to say the same thing.
Merv Adrian’s talk at the same conference made it clear that Gartner feels the same way, as does he personally. Indeed, like me, he’s racked up multiple decades of industry experience without ever finding a single theoretically ideal grand central EDW.
Forrester Research has been a little less clear on the point, but generally seems to be on the correct side of the issue as well.
If somebody is still saying that one central enterprise data warehouse can hold all the information or data you need on which to base your business decisions, they’re probably not somebody you should be listening to very hard.
Is that clear, or should I hammer home the point even harder? 😀
Categories: Data warehousing, IBM and DB2, Netezza | 8 Comments |
Vertica as an analytic platform
Vertica 5.0 is coming out today, and delivering the down payment on Vertica’s analytic platform strategy. In Vertica lingo, there’s now a Vertica SDK (Software Development Kit), featuring Vertica UDT(F)s* (User-Defined Transform Functions). Vertica UDT syntax basics start: Read more
Categories: Analytic technologies, Data warehousing, GIS and geospatial, Predictive modeling and advanced analytics, RDF and graphs, Vertica Systems, Workload management | 7 Comments |
Columnar DBMS vendor customer metrics
Last April, I asked some columnar DBMS vendors to share customer metrics. They answered, but it took until now to iron out a couple of details. Overall, the answers are pretty impressive. Read more
Investigative analytics and derived data: Enzee Universe 2011 talk
I’ll be speaking Monday, June 20 at IBM Netezza’s Enzee Universe conference. Thus, as is my custom:
- I’m posting draft slides.
- I’m encouraging comment (especially in the short time window before I have to actually give the talk).
- I’m offering links below to more detail on various subjects covered in the talk.
The talk concept started out as “advanced analytics” (as opposed to fast query, a subject amply covered in the rest of any Netezza event), as a lunch break in what is otherwise a detailed “best practices” session. So I suggested we constrain the subject by focusing on a specific application area — customer acquisition and retention, something of importance to almost any enterprise, and which exploits most areas of analytic technology. Then I actually prepared the slides — and guess what? The mix of subjects will be skewed somewhat more toward generalities than I first intended, specifically in the areas of investigative analytics and derived data. And, as always when I speak, I’ll try to raise consciousness about the issues of liberty and privacy, our options as a society for addressing them, and the crucial role we play as an industry in helping policymakers deal with these technologically-intense subjects.
Slide 3 refers back to a post I made last December, saying there are six useful things you can do with analytic technology:
- Operational BI/Analytically-infused operational apps: You can make an immediate decision.
- Planning and budgeting: You can plan in support of future decisions.
- Investigative analytics (multiple disciplines): You can research, investigate, and analyze in support of future decisions.
- Business intelligence: You can monitor what’s going on, to see when it necessary to decide, plan, or investigate.
- More BI: You can communicate, to help other people and organizations do these same things.
- DBMS, ETL, and other “platform” technologies: You can provide support, in technology or data gathering, for one of the other functions.
Slide 4 observes that investigative analytics:
- Is the most rapidly advancing of the six areas …
- … because it most directly exploits performance & scalability.
Slide 5 gives my simplest overview of investigative analytics technology to date: Read more
Notes and links, June 15, 2011
Five things: Read more
Metaphors amok
It all started when I disputed James Kobielus’ blogged claim that Hadoop is the nucleus of the next-generation cloud EDW. Jim posted again to reiterate the claim, only this time he wrote that all EDW vendors [will soon] bring Hadoop into their heart of their architectures. (All emphasis mine.)
That did it. I tweeted, in succession:
- Actually, I vote for Hadoop as the lungs of the EDW — first place of entry for essential nutrients.
- Data integration can be the heart of the EDW, pumping stuff around. RDBMS/analytic platform can be the brain.
- iPad-based dashboards that may engender envy, but which actually are only used occasionally and briefly … well, you get the picture.*
*Woody Allen said in Sleeper that the brain was his second-favorite organ.
Of course, that body of work was quickly challenged. Responses included: Read more
Categories: Analytic technologies, Business intelligence, Data warehousing, EAI, EII, ETL, ELT, ETLT, Fun stuff, Hadoop, Humor, MapReduce | Leave a Comment |
Infobright 4.0
Infobright is announcing its 4.0 release, with imminent availability. In marketing and product alike, Infobright is betting the farm on machine-generated data. This hasn’t been Infobright’s strategy from the getgo, but it is these days, with pretty good focus and commitment. While some fraction of Infobright’s customer base is in the Sybase-IQ-like data mart market — and indeed Infobright put out a customer-win press release in that market a few days ago — Infobright’s current customer targets seem to be mainly:
- Web companies, many of which are already MySQL users.
- Telecommunication and similar log data, especially in OEM relationships.
- Trading/financial services, especially at mid-tier companies.
Key aspects of Infobright 4.0 include: Read more
Categories: Data warehousing, Database compression, Infobright, Investment research and trading, Log analysis, Open source, Telecommunications, Web analytics | 8 Comments |
The essence of an application
Once upon a time, information technology was strictly about — well, information. And by “information” what was meant was “data”.* An application boiled down to a database design, plus a straightforward user interface, in whatever the best UI technology of the day happened to be. Things rarely worked quite as smoothly as the design-database/press-button/generate-UI propaganda would have one believe, but database design was clearly at the center of application invention.
*Not coincidentally, two of the oldest names for “IT” were data processing and management information systems.
Eventually, there came to be three views of the essence of IT:
- Data — i.e., the traditional view, still exemplified by IBM and Oracle.
- People empowerment — i.e., Microsoft-style emphasis on UI friendliness and efficiency.
- Operational workflow — i.e., SAP-style emphasis on actual business processes.
Graphical user interfaces were a major enabling technology for that evolution. Equally important, relational databases made some difficult problems easy(ier), freeing application designers to pursue more advanced functionality.
Based on further technical evolution, specifically in analytic and consumer technologies, I think we should now take that list up to five. The new members I propose are:
- Investigative analytics.
- Emotional response.
Categories: Data warehousing, Facebook, Predictive modeling and advanced analytics, Theory and architecture, Web analytics | 1 Comment |
Notes from the Fusion-io S-1 filing
Fusion-io has filed for an initial public offering. With public offerings go S-1 filings which, along with 10-Ks, are the kinds of SEC filing that typically contain a few nuggets of business information. Notes from Fusion-io’s S-1 include:
Fusion-io is growing very, very fast, doubling or better in revenue every 6 months.
Fusion-io’s marketing message revolves around “data centralization”. Fusion-io is competing against storage-area networks and storage arrays.
Fusion-io’s list of application types includes
… systems dedicated to decision support, high performance financial analysis, web search, content delivery and enterprise resource planning.
Fusion-io says it has shipped over 20 petabytes of storage.
Fusion-io has a shifting array of big customers, including OEMs: Read more
Categories: Analytic technologies, Data warehousing, Facebook, Solid-state memory, Storage | Leave a Comment |
IBM InfoSphere Warehouse pricing, packaging, compression and more
IBM InfoSphere Warehouse 9.7.3 has been announced, and is planned for general availability late this month. IBM InfoSphere Warehouse is, in essence, DB2-plus, where the “plus” comprises:
- DPF (Data Partitioning Feature) — i.e., the ability to do shared-nothing scale-out.
- Unimportant add-ons — e.g., a mere 5 seats of the Cognos BI tool.
The main news in this release of InfoSphere Warehouse is probably pricing. While IBM has long had a funky server-power-based pricing scheme, it is now adding per-terabyte pricing, with a twist: IBM InfoSphere Warehouse now can be bought per terabyte of compressed user data. Specifically:
- IBM InfoSphere Warehouse 9.7.3 Enterprise Edition can be bought for production for $70K or so per terabyte of compressed user data.
- IBM InfoSphere Warehouse 9.7.3 Departmental Edition can be bought for production for $35K or so per terabyte of compressed user data.
- Development/test seats of IBM InfoSphere Warehouse cost about $2K per user.
- High availability/disaster recovery instances are priced as if they were managing 1 TB each — unless, of course, you have an active-active configuration, in which case they’re priced according to their full amount of data.
Per-terabyte pricing is generally a good way to think about analytic DBMS costs, for at least two reasons: Read more
Categories: Data warehousing, Database compression, IBM and DB2, Pricing | 1 Comment |