Business intelligence
Analysis of companies, products, and user strategies in the area of business intelligence. Related subjects include:
- Data warehousing
- Business Objects
- Cognos
- QlikTech
- (in Text Technologies) Text mining
- (in Text Technologies) Text analytics/business intelligence integration
- (in The Monash Report) Strategic issues in business intelligence
- (in Software Memories) Historical notes on business intelligence
Eight kinds of analytic database (Part 2)
In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I’ll cover four more kinds of analytic database — even newer, for the most part, with a use case/product short list match that is even less clear. Read more
Eight kinds of analytic database (Part 1)
Analytic data management technology has blossomed, leading to many questions along the lines of “So which products should I use for which category of problem?” The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for “big data” is little help.
Let’s try eight categories instead. While no categorization is ever perfect, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need — and in most cases you’ll need several — is a great early step in your analytic technology planning. Read more
What colleges should teach in analytics
Based on a Teradata press release calling attention to the small amount of explicit university instruction in business intelligence, I was asked:
Does BI really need a dedicated undergrad track? What sort of BI and analytics-related skills should students look to obtain now in order to be viable in the job marketplace five years out?
My answers were (slightly edited):
- Most important is a basic, intuitive understanding of statistical significance. If you’re looking at an apparent trend, is it real or just random variation?
- Also crucial are general analytic and quantitative problem-solving skills.
- One also should have a comfort level learning how to use new software tools.
- Everybody in business should have those skillsets. So should people in science, medicine, teaching, journalism, government, and most other vocations.
- The more analytically oriented should add basic programming skills, and basic knowledge of SQL. While SQL’s utter dominance is ebbing a bit, it still will be with us for a very long time.
Of course, there are more specialized skills also worth teaching, in a number of areas, starting with statistics and other predictive modeling technologies. But it’s OK to go through life not knowing those.
Categories: Analytic technologies, Business intelligence, Data warehousing, NoSQL, Predictive modeling and advanced analytics, Teradata | 1 Comment |
What to think about BEFORE you make a technology decision
When you are considering technology selection or strategy, there are a lot of factors that can each have bearing on the final decision — a whole lot. Below is a very partial list.
In almost any IT decision, there are a number of environmental constraints that need to be acknowledged. Organizations may have standard vendors, favored vendors, or simply vendors who give them particularly deep discounts. Legacy systems are in place, application and system alike, and may or may not be open to replacement. Enterprises may have on-premise or off-premise preferences; SaaS (Software as a Service) vendors probably have multitenancy concerns. Your organization can determine which aspects of your system you’d ideally like to see be tightly integrated with each other, and which you’d prefer to keep only loosely coupled. You may have biases for or against open-source software. You may be pro- or anti-appliance. Some applications have a substantial need for elastic scaling. And some kinds of issues cut across multiple areas, such as budget, timeframe, security, or trained personnel.
Multitenancy is particularly interesting, because it has numerous implications. Read more
Citrusleaf RTA
Citrusleaf has released an add-on product called Citrusleaf RTA (Real-Time Attribution). It’s to be used when:
- You want to update dashboards within a minute.
- You want to update predictive models fairly quickly (within the hour?), although it’s not clear to me how much the models are being updated or changed with that latency.
The metrics envisioned are:
- 100 or so ad impressions per person …
- … for 1 billion or so people …
- … stored for 30-90 days …
- … where each ad impression is a fairly short record …
- … stored on disk …
- … but indexed in a way so that the index can fit into RAM.
- 50-100,000 writes per second. (I didn’t ask on what amount of hardware.)
- Several hundred reads per second.
A consistent relational schema is NOT assumed.
Citrusleaf’s solution is:
- Have one index entry for each of the 1 billion people.
- Bang each new object/record to disk. Include in it a pointer to the previous object/record for the same person.
- Each time a new object/record is added, update the index in place so that it now points to the new once. Hence, the index is sized according to the number of people, not according to the total number of objects/records.
- Eventually let objects/records age off in the obvious way.
The downside is that when you do read 100 objects/records per person, you might need to do 100 seeks.
Investigative analytics and derived data: Enzee Universe 2011 talk
I’ll be speaking Monday, June 20 at IBM Netezza’s Enzee Universe conference. Thus, as is my custom:
- I’m posting draft slides.
- I’m encouraging comment (especially in the short time window before I have to actually give the talk).
- I’m offering links below to more detail on various subjects covered in the talk.
The talk concept started out as “advanced analytics” (as opposed to fast query, a subject amply covered in the rest of any Netezza event), as a lunch break in what is otherwise a detailed “best practices” session. So I suggested we constrain the subject by focusing on a specific application area — customer acquisition and retention, something of importance to almost any enterprise, and which exploits most areas of analytic technology. Then I actually prepared the slides — and guess what? The mix of subjects will be skewed somewhat more toward generalities than I first intended, specifically in the areas of investigative analytics and derived data. And, as always when I speak, I’ll try to raise consciousness about the issues of liberty and privacy, our options as a society for addressing them, and the crucial role we play as an industry in helping policymakers deal with these technologically-intense subjects.
Slide 3 refers back to a post I made last December, saying there are six useful things you can do with analytic technology:
- Operational BI/Analytically-infused operational apps: You can make an immediate decision.
- Planning and budgeting: You can plan in support of future decisions.
- Investigative analytics (multiple disciplines): You can research, investigate, and analyze in support of future decisions.
- Business intelligence: You can monitor what’s going on, to see when it necessary to decide, plan, or investigate.
- More BI: You can communicate, to help other people and organizations do these same things.
- DBMS, ETL, and other “platform” technologies: You can provide support, in technology or data gathering, for one of the other functions.
Slide 4 observes that investigative analytics:
- Is the most rapidly advancing of the six areas …
- … because it most directly exploits performance & scalability.
Slide 5 gives my simplest overview of investigative analytics technology to date: Read more
Notes and links, June 15, 2011
Five things: Read more
Metaphors amok
It all started when I disputed James Kobielus’ blogged claim that Hadoop is the nucleus of the next-generation cloud EDW. Jim posted again to reiterate the claim, only this time he wrote that all EDW vendors [will soon] bring Hadoop into their heart of their architectures. (All emphasis mine.)
That did it. I tweeted, in succession:
- Actually, I vote for Hadoop as the lungs of the EDW — first place of entry for essential nutrients.
- Data integration can be the heart of the EDW, pumping stuff around. RDBMS/analytic platform can be the brain.
- iPad-based dashboards that may engender envy, but which actually are only used occasionally and briefly … well, you get the picture.*
*Woody Allen said in Sleeper that the brain was his second-favorite organ.
Of course, that body of work was quickly challenged. Responses included: Read more
Categories: Analytic technologies, Business intelligence, Data warehousing, EAI, EII, ETL, ELT, ETLT, Fun stuff, Hadoop, Humor, MapReduce | Leave a Comment |
Endeca topics
I visited my then-clients at Endeca in January. We focused on underpinnings (and strategic counsel) more than on coolness in what the product actually does. But going over my notes I think there’s enough to write up now.
Before saying much else about Endeca, there’s one confusion to dispose of: What’s the relationship between Endeca’s efforts in e-commerce (helping shoppers navigate websites) and business intelligence (helping people navigate their own data)? As Endeca tells it:
- Endeca’s e-commerce and business intelligence efforts are reflections of the same technical approach. Indeed, I’m pretty sure Endeca’s product lines still share much/most of the same technology.
- Endeca went after e-commerce first because that’s where the provable ROI was. As I pointed out a couple of times in 2007, Endeca became a market leader in that area.
- Endeca increased its BI efforts later.
- Circa 2009-10, Endeca differentiated its e-commerce and BI product lines from each other.
- An e-commerce line extension called Page Builder is what really got Endeca through the recent recession.
- The BI product line Latitude was launched in the fall of 2010.
Endeca’s positioning in the business intelligence market boils down to “investigative analytics for people who aren’t hardcore analysts.” Endeca’s technological support for that stresses: Read more
Categories: Business intelligence, Columnar database management, Database compression, Endeca | 11 Comments |
Terminology: Investigative analytics
In my post on the six useful things you can do with analytic technology, one of the six was
Research, investigate, and analyze in support of future decisions.
I’m calling that investigative analytics, and am hopeful the term will catch on.
I went on to say that the term conflated several disciplines, namely:
- Statistics, data mining, machine learning, and/or predictive analytics. …
- The more research-oriented aspects of business intelligence tools. …
- Analogous technologies as applied to non-tabular data types such as text or graph.
By way of contrast, I don’t regard business activity monitoring (BAM) or other kinds of monitoring-oriented business intelligence (BI) as part of “investigative analytics,” because they don’t seem particularly investigative.
Based on the above, I propose the following simple definition of the investigative analytics activity or process:
Seeking (previously unknown) patterns in data.
Categories: Analytic technologies, Business intelligence | 22 Comments |