Data mart outsourcing
Discussion of services that analyze large databases on an outsourced basis. Related subjects include:
- Data warehousing
- SaaS (Software as a Service)
- 1010data
- TEOCO
- (in The Monash Report) Verix
- (in Text Technologies) Text mining SaaS
Sybase IQ business notes
As specialized analytic DBMS go, Sybase is near the top of the charts both in age (Sybase IQ was first introduced in the mid 1990s) and adoption. That’s even more true, of course, if we restrict the discussion strictly to columnar DBMS, aka column stores. Basic Sybase IQ adoption claims include:
- >1500 users
- >3000 installations (Sybase has variously cited 2.1 and 2.5+ as the installation/user ratio)
- At least ~50-60 installations with >5 terabytes of user data
Note that 98% of Sybase IQ installations are under 5 terabytes; the heart of Sybase IQ’s business is the sub-terabyte data warehouse market.* Read more
Categories: Analytic technologies, Data mart outsourcing, Data warehousing, Investment research and trading, Sybase | 3 Comments |
What are the best choices for scaling Postgres?
March, 2011 edit: In its quaintness, this post is a reminder of just how fast Short Request Processing DBMS technology has been moving ahead. If I had to do it all over again, I’d suggest they use one of the high-performance MySQL options like dbShards, Schooner, or both together. I actually don’t know what they finally decided on in that area. (I do know that for analytic DBMS they chose Vertica.)
I have a client who wants to build a new application with peak update volume of several million transactions per hour. (Their base business is data mart outsourcing, but now they’re building update-heavy technology as well. ) They have a small budget. They’ve been a MySQL shop in the past, but would prefer to contract (not eliminate) their use of MySQL rather than expand it.
My client actually signed a deal for EnterpriseDB’s Postgres Plus Advanced Server and GridSQL, but unwound the transaction quickly. (They say EnterpriseDB was very gracious about the reversal.) There seem to have been two main reasons for the flip-flop. First, it seems that EnterpriseDB’s version of Postgres isn’t up to PostgreSQL’s 8.4 feature set yet, although EnterpriseDB’s timetable for catching up might have tolerable. But GridSQL apparently is further behind yet, with no timetable for up-to-date PostgreSQL compatibility. That was the dealbreaker.
The current base-case plan is to use generic open source PostgreSQL, with scale-out achieved via hand sharding, Hibernate, or … ??? Experience and thoughts along those lines would be much appreciated.
Another option for OLTP performance and scale-out is of course memory-centric options such as VoltDB or the Groovy SQL Switch. But this client’s database is terabyte-scale, so hardware costs could be an issue, as of course could be product maturity.
By the way, a large fraction of these updates will be actual changes, as opposed to new records, in case that matters. I expect that the schema being updated will be very simple — i.e., clearly simpler than in a classic order entry scenario.
Data warehousing business trends
I’ve talked with a whole lot of vendors recently, some here at TDWI, as well as users, fellow analysts, and so on. Repeated themes include: Read more
Categories: Analytic technologies, Application areas, Data mart outsourcing, Data warehousing, eBay, Microsoft and SQL*Server, MySQL, Oracle, Teradata | Leave a Comment |
Database SaaS gains a little visibility
Way back in the 1970s, a huge fraction of analytic database management was done via timesharing, specifically in connection with the RAMIS and FOCUS business-intelligence-precursor fourth-generation languages. (Both were written by Gerry Cohen, who built his company Information Builders around the latter one.) The market for remoting-computing business intelligence has never wholly gone away since. Indeed, it’s being revived now, via everything from the analytics part of Salesforce.com to the service category I call data mart outsourcing.
Less successful to date are efforts in the area of pure database software-as-a-service. It seems that if somebody is going for SaaS anyway, they usually want a more complete, integrated offering. The most noteworthy exceptions I can think of to this general rule are Kognitio and Vertica, and they only have a handful of database SaaS customers each. To wit: Read more
Some Netezza customer metrics
From the conference call based on Netezza’s July, 2008 Q1, as of the end of Q1:
- There are now 191 Netezza customers.
- 18 of those were new.
- 78% of Netezza’s business was in North America and 22% was international.
- Netezza operates in 10 countries.
- “The top 4 vertical markets represented approximately 75% of our business, with those markets being telcos, retail, financial services, and the analytic service provider segment. “
- One analytic service provider was greater than 10% of revenue for the quarter, and is expected to keep buying a lot in subsequent quarters. Also, one analytic service provider standardized on Netezza. I’m guessing that’s the same customer.
- “We ended the quarter with 45 [quota] carrying teams made up of a sales rep and a systems engineer and our plan is to continue to hire direct sales teams at the pace of 3 to 5 per quarter every quarter. These direct reps accounted for 85% of the business while the indirect activity was 15% this quarter.”
Categories: Application areas, Data mart outsourcing, Data warehouse appliances, Data warehousing, Market share and customer counts, Netezza, Telecommunications | 1 Comment |
Jerry Held on cloud data warehousing and how business intelligence will be transformed by it
Vertica Chairman Jerry Held has a pair of blog posts on analytics and data warehousing in the cloud. The first lays out a number of potential benefits and consequences of cloud data warehousing, under the heading of “Transforming BI”: Read more
Categories: Analytic technologies, Business intelligence, Cloud computing, Data mart outsourcing, Data warehousing, Software as a Service (SaaS), Vertica Systems | 7 Comments |
Data warehouse appliance power user TEOCO
If you had to name super-high-end users of data warehouse technology, your list might start with a few retailers, credit data processors, and telcos, plus the US intelligence establishment. Well, it turns out that TEOCO runs outsourced data warehouses for several of the top US telcos, making it one of the top data warehouse technology users around.
A few weeks ago, I had a fascinating chat with John Devolites of TEOCO. Highlights included:
- TEOCO runs a >200 TB DATAllegro warehouse for a major US telco. (When we hear about a big DATAllegro telco site that’s been in production for a while, that’s surely the one they’re talking about.)
- TEOCO runs around 450 TB total of DATAllegro databases across its various customers. (When Stuart Frost blogs of >400 TB “systems,” that may be what he’s talking about.)
- TEOCO likes DATAllegro better than Netezza, although the margin is now small. This is mainly for financial reasons, specifically price-per-terabyte. When TEOCO spends its own money without customer direction as to appliance brand, it buys DATAllegro.
- TEOCO runs at least one 50 TB Netezza system — originally due to an acquisition of a Netezza user — with more coming. There also is more DATAllegro coming.
- TEOCO feels 15-30 concurrent users is the current practical limit for both DATAllegro and Netezza. That’s greater than it used to be.
- Netezza is a little faster than DATAllegro on a few esoteric queries, but the difference is not important to TEOCO’s business.
- Official price lists notwithstanding, TEOCO sees prices as being in the $10K/TB range. DATAllegro’s price advantage has shrunk greatly, as others have come down to more or less match. However, since John stated his price preference for DATAllegro as being in the present tense, I presume the price match isn’t perfect.
- Teradata was never a serious consideration, for price reasons.
- In the original POC a few years ago, the incumbent Oracle — even after extensive engineering — couldn’t get an important query down under 8 hours of running time. DATAllegro and Netezza both handled it in 2-3 minutes. Similarly, Oracle couldn’t get the load time for 100 million call detail records (CDRs) below 24 hours.
- Applications sound pretty standard for telecom: Lots of CDR processing — 550 million/day on the big DATAllegro system cited above. Pricing and fraud checking. Some data staging for legal reasons (giving the NSA what it subpoenas and no more).
Categories: Analytic technologies, Data mart outsourcing, Data warehouse appliances, Data warehousing, DATAllegro, Netezza, Pricing, Specific users, Telecommunications, TEOCO | 7 Comments |
Outsourced data marts
Call me slow on the uptake if you like, but it’s finally dawned on me that outsourced data marts are a nontrivial segment of the analytics business. For example:
- I was just briefed by Vertica, and got the impression that data mart outsourcers may be Vertica’s #3 vertical market, after financial services and telecom. Certainly it seems like they are Vertica’s #3 market if you bundle together data mart outsourcers and more conventional OEMs.
- When Netezza started out, a bunch of its early customers were credit data-based analytics outsourcers like Acxiom.
- After nagging DATAllegro for a production reference, I finally got a good one — TEOCO. TEOCO specializes in figuring out whether inter-carrier telcom bills are correct. While there’s certainly a transactional invoice-processing aspect to this, the business seems to hinge mainly around doing calculations to figure out correct charges.
- I was talking with Pervasive about Pervasive Datarush, a beta product that lets you do super-fast analytics on data even if you never load it into a DBMS in the first place. I challenged them for use cases. One user turns out to be an insurance claims rule-checking outsourcer.
- One of Infobright’s references is a French CRM analytics outsourcer, 1024 Degres.
- 1010data has built up a client base of 50-60, including a number of financial and retail blue-chippers, with a soup-to-nuts BI/analysis/columnar database stack.
- I haven’t heard much about Verix in a while, but their niche was combining internal sales figures with external point-of-sale/prescription data to assess retail (especially pharma) microtrends.
To a first approximation, here’s what I think is going on. Read more