Data warehousing
Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:
Architectural options for analytic database management systems
Mike Stonebraker recently kicked off some discussion about desirable architectural features of a columnar analytic DBMS. Let’s expand the conversation to cover desirable architectural characteristics of analytic DBMS in general. Read more
The six useful things you can do with analytic technology
I seem to be in the mode of sharing some of my frameworks for thinking about analytic technology. Here’s another one.
Ultimately, there are six useful things you can do with analytic technology:
- You can make an immediate decision.
- You can plan in support of future decisions.
- You can research, investigate, and analyze in support of future decisions.
- You can monitor what’s going on, to see when it necessary to decide, plan, or investigate.
- You can communicate, to help other people and organizations do these same things.
- You can provide support, in technology or data gathering, for one of the other functions.
Technology vendors often cite similar taxonomies, claiming to have all the categories (as they conceive them) nicely represented, in slickly integrated fashion. They exaggerate. Most of these categories are in rapid flux, and the rest should be. Analytic technology still has a long way to go.
In more detail: Read more
Categories: Analytic technologies, Business intelligence, Cognos, Data warehousing, RDF and graphs, Text | 13 Comments |
Examples and definition of machine-generated data
In posts made last December, January, and April, I argued:
- Much of the growth in analytic data volumes will come in the form of machine-generated data.
- Unlike human-generated data, machine-generated data will grow at Moore’s Law kinds of speeds.
- Thus, unlike human-generated data, which I advocate keeping pretty much in all its detail, machine-generated data will continue to be in large part thrown away.
Recently and somewhat belatedly, I added a somewhat obvious point — if we don’t keep all or even most of our machine-generated data, then what we keep is likely to be in some way massaged, extracted, or derived. The purpose of this post is to address a second oversight — giving a hopefully clear definition of what I actually mean by “machine-generated data.” Read more
Categories: Data warehousing | 28 Comments |
Evolving definitions and technology categories for 2011
It seems my prediction of a limited blogging schedule in December came emphatically true. I shall re-start with a collection of quick thoughts, clearing the decks for more detailed posts to follow. Read more
Categories: Analytic technologies, Data types, Data warehousing, DBMS product categories, MOLAP, Theory and architecture | 6 Comments |
Data that is derived, augmented, enhanced, adjusted, or cooked
On this food-oriented weekend, I could easily go on long metaphorical flights about the distinction between “raw” and “cooked” data. I’ll spare you that part — reluctantly, given my fondness for fresh fruit, sushi, and steak tartare — but there’s no escaping the importance of derived/augmented/enhanced/cooked/adjusted data for analytic data processing. The five areas I have in mind are, loosely speaking:
- Aggregates, when they are maintained, generally for reasons of performance or response time.
- Calculated scores, commonly based on data mining/predictive analytics.
- Text analytics.
- The kinds of ETL (Extract/Transform/Load) Hadoop and other forms of MapReduce are commonly used for.
- Adjusted data, especially in scientific contexts.
Categories: Analytic technologies, Data warehousing, Derived data | 12 Comments |
Teradata announcements made very simple
For reasons of health,* I very regretfully canceled my trip to what is the first conference to go on my schedule every year — Teradata Partners. From afar, I’m not plugged into the details of Teradata’s announcement/embargo schedule. But what you need to know starts with this:
- Teradata signaled a year ago that its software focus was on adding analytic functionality, including specifically in the temporal area.
- Teradata likes to refresh its hardware annually, with a 50%+ price/performance improvement. (This year Teradata is going to 6-core Xeon processors.)
*Just a cough, but I’m both exhausted and potentially contagious, and this wasn’t a trip on which I had any truly urgent obligations (speeches, packed-room consulting sessions, whatever).
Categories: Analytic technologies, Data warehousing, Teradata | Leave a Comment |
Notes on data warehouse appliance prices
I’m not terribly motivated to do a detailed analysis of data warehouse appliance list prices, in part because:
- Everybody knows that in practice data warehouse appliances tend to be deeply discounted from list price.
- The only realistic metric to use for pricing data warehouse appliances is price-per-terabyte, and people have gotten pretty sick of that one.
That said, here are some notes on data warehouse appliance prices. Read more
Categories: Data warehouse appliances, Data warehousing, Database compression, EMC, Exadata, Greenplum, Netezza, Oracle, Pricing | 8 Comments |
Notes on the EMC Greenplum Data Computing Appliance
The big confidential part of my visit last week to EMC’s Data Computing Division, nee’ Greenplum, was of course this week’s announcement of the first EMC/Greenplum “Data Computing Appliance.” Basics include: Read more
Categories: Analytic technologies, Data warehousing, EMC, Exadata, Greenplum, Oracle, Parallelization, Storage | 1 Comment |
Partnering with Cloudera
After I criticized the marketing of the Aster/Cloudera partnership, my clients at Aster Data and Cloudera ganged up on me and tried to persuade me I was wrong. Be that as it may, that conversation and others were helpful to me in understanding the core thesis: Read more
Categories: Analytic technologies, Aster Data, Cloudera, Data warehousing, Database diversity, Hadoop, MapReduce, Parallelization, Petabyte-scale data management | 11 Comments |
Notes and links October 10 2010
More quick-hit notes, links, and so on: Read more