Application areas
Posts focusing on the use of database and analytic technologies in specific application domains. Related subjects include:
- Any subcategory
- (in Text Technologies) Specific application areas for text analytics
Data warehouse appliance power user TEOCO
If you had to name super-high-end users of data warehouse technology, your list might start with a few retailers, credit data processors, and telcos, plus the US intelligence establishment. Well, it turns out that TEOCO runs outsourced data warehouses for several of the top US telcos, making it one of the top data warehouse technology users around.
A few weeks ago, I had a fascinating chat with John Devolites of TEOCO. Highlights included:
- TEOCO runs a >200 TB DATAllegro warehouse for a major US telco. (When we hear about a big DATAllegro telco site that’s been in production for a while, that’s surely the one they’re talking about.)
- TEOCO runs around 450 TB total of DATAllegro databases across its various customers. (When Stuart Frost blogs of >400 TB “systems,” that may be what he’s talking about.)
- TEOCO likes DATAllegro better than Netezza, although the margin is now small. This is mainly for financial reasons, specifically price-per-terabyte. When TEOCO spends its own money without customer direction as to appliance brand, it buys DATAllegro.
- TEOCO runs at least one 50 TB Netezza system — originally due to an acquisition of a Netezza user — with more coming. There also is more DATAllegro coming.
- TEOCO feels 15-30 concurrent users is the current practical limit for both DATAllegro and Netezza. That’s greater than it used to be.
- Netezza is a little faster than DATAllegro on a few esoteric queries, but the difference is not important to TEOCO’s business.
- Official price lists notwithstanding, TEOCO sees prices as being in the $10K/TB range. DATAllegro’s price advantage has shrunk greatly, as others have come down to more or less match. However, since John stated his price preference for DATAllegro as being in the present tense, I presume the price match isn’t perfect.
- Teradata was never a serious consideration, for price reasons.
- In the original POC a few years ago, the incumbent Oracle — even after extensive engineering — couldn’t get an important query down under 8 hours of running time. DATAllegro and Netezza both handled it in 2-3 minutes. Similarly, Oracle couldn’t get the load time for 100 million call detail records (CDRs) below 24 hours.
- Applications sound pretty standard for telecom: Lots of CDR processing — 550 million/day on the big DATAllegro system cited above. Pricing and fraud checking. Some data staging for legal reasons (giving the NSA what it subpoenas and no more).
Categories: Analytic technologies, Data mart outsourcing, Data warehouse appliances, Data warehousing, DATAllegro, Netezza, Pricing, Specific users, Telecommunications, TEOCO | 7 Comments |
Outsourced data marts
Call me slow on the uptake if you like, but it’s finally dawned on me that outsourced data marts are a nontrivial segment of the analytics business. For example:
- I was just briefed by Vertica, and got the impression that data mart outsourcers may be Vertica’s #3 vertical market, after financial services and telecom. Certainly it seems like they are Vertica’s #3 market if you bundle together data mart outsourcers and more conventional OEMs.
- When Netezza started out, a bunch of its early customers were credit data-based analytics outsourcers like Acxiom.
- After nagging DATAllegro for a production reference, I finally got a good one — TEOCO. TEOCO specializes in figuring out whether inter-carrier telcom bills are correct. While there’s certainly a transactional invoice-processing aspect to this, the business seems to hinge mainly around doing calculations to figure out correct charges.
- I was talking with Pervasive about Pervasive Datarush, a beta product that lets you do super-fast analytics on data even if you never load it into a DBMS in the first place. I challenged them for use cases. One user turns out to be an insurance claims rule-checking outsourcer.
- One of Infobright’s references is a French CRM analytics outsourcer, 1024 Degres.
- 1010data has built up a client base of 50-60, including a number of financial and retail blue-chippers, with a soup-to-nuts BI/analysis/columnar database stack.
- I haven’t heard much about Verix in a while, but their niche was combining internal sales figures with external point-of-sale/prescription data to assess retail (especially pharma) microtrends.
To a first approximation, here’s what I think is going on. Read more
Truviso and EnterpriseDB blend event processing with ordinary database management
Truviso and EnterpriseDB announced today that there’s a Truviso “blade” for Postgres Plus. By email, EnterpriseDB Bob Zurek endorsed my tentative summary of what this means technically, namely:
There’s data being managed transactionally by EnterpriseDB.
Truviso’s DML has all along included ways to talk to a persistent Postgres data store.
If, in addition, one wants to do stream processing things on the same data, that’s now possible, using Truviso’s usual DML.
Optimizing WordPress database usage
There’s an amazingly long comment thread on Coding Horror about WordPress optimization. Key points and debates include:
- WordPress makes scads of database calls on every page. (20 is the supposed default number. That sounds a little high to me, but not wholly incredible.)
- Therefore one should use a caching plug-in. WP-Cache is the preferred one. WP-Super-Cache gets some votes as perhaps being even better.
- In theory the database cache should handle most of the problem. (After all, many of those database queries are the same for every page.) In practice, it often doesn’t, even if you use dedicated (as opposed to shared) web hosting.
- LAMP vs. Microsoft stack (uh-oh).
- Drupal vs. WordPress vs. Movable Type vs. Joomla vs. do-it-yourself (uh-oh too).
Another theme is — well, it’s WordPress “theme” design. Do you really need all those calls? The most dramatic example I can think of one I experienced soon after I started this blog. Some themes have the cool feature that, in the category list on the sidebar, there’s a count of the number of posts in the category. Each category. I love that feature, but its performance consequences are not pretty.
As previously noted, we’ll be doing an emergency site upgrade ASAP. Once we’re upgraded to WordPress 2.5, I hope to deploy a rich set of back-end plug-ins. One of the caching ones will be among them.
Categories: About this blog, Application areas, Cache | 1 Comment |
The core challenges of OLTP are changing
I wrote a few weeks ago about the H-Store project, which rejects a variety of assumptions underlying traditional OLTP database design. One of these is long transactions over open database connections. The idea is that the most demanding OLTP applications run on the Web, where abandonment is common, and hence the only sensible option is to break things up into simple chunks. Read more
Categories: Application areas, OLTP | Leave a Comment |
Vertica update
I chatted with Andy Ellicott and Mike Stonebraker of Vertica today. Some of the content is embargoed until February 19 (for TDWI), but here are some highlights of the rest.
- Vertica now is “approaching” 50 paid customers, up from 15 or so in early November. (Compared to most of Vertica’s fellow data warehouse specialists, that’s a lot.) Many — perhaps most — of these customers are hedge funds or telcos.
- Vertica’s typical lag from sale to deployment is about one quarter.
- Vertica’s typical initial selling price is $250K. Or maybe it’s $100-150K. The Vertica guys are generally pretty forthcoming, but pricing is an exception. Whatever they charge, it’s strictly per terabyte of user data. They think they are competitive with other software vendors, and cheaper, all-in, than appliance vendors.
- One subject on which they’re totally non-forthcoming (lawyers’ orders) is the recent patent lawsuit filed by Sybase. They wouldn’t even say whether they thought it was bogus because they didn’t infringe, or whether they thought it was bogus because the patent shouldn’t have been granted.
- Average Vertica database size is a little under 10 terabytes of user data, with many examples in the 15-20 Tb range. Lots of customers plan to expand to 50-100 Tb.
- Vertica claims sustainable load speeds of 3-5 megabytes/sec/node, irrespective of database size. Data is sucked into RAM uncompressed, then written out a gig/node at a time, compressed. Gigabyte chunks are then merged on disk, which is superfast as it doesn’t involve sorting. (30 megabytes/second.) Mike insists this doesn’t compromise compression.
We also addressed the subject of Vertica’s schema assumptions, but I’ll leave that to another post.
Categories: Analytic technologies, Data warehousing, Database compression, Investment research and trading, Michael Stonebraker, Sybase, Theory and architecture, Vertica Systems | 6 Comments |
Is MapReduce a good underpinning for next-gen scientific DBMS?
Back in November, Mike Stonebraker suggested that there’s a need for database management advances to serve “big science”. He said:
Obviously, the best solution to these … problems would be to put everything in a next-generation DBMS — one capable of keeping track of data, metadata, and lineage. Supporting the latter would require all operations on the data to be done inside the DBMS with user-defined functions — Postgres-style.
Categories: Data types, MapReduce, Scientific research | Leave a Comment |
One of the funniest fake press releases ever
About an extended outage in Lord of the Rings Online.
Edited July 2, 2008: New URL that works at least for now.
Categories: Games and virtual worlds, Humor | 1 Comment |
One Greenplum customer — 35 terabytes and growing fast
I was at the Business Objects conference this week, and as usual went to very few sessions. But one I did stroll into was on “Managing Rapid Growth With the Right BI Strategy.” This was by Reliance Telecommunications, an outfit in India that is adding telecom subscribers very quickly, and consequently banging 100-150 gigs of data per day into a 35 terabyte warehouse.
The beginning of the talk astonished me, as the presenter seemed to be saying they were doing all this on Oracle. Hah. Oracle is what they moved away from; instead, they got Greenplum. I couldn’t get details; indeed, as a BI guy he was far enough away from DBMS to misspeak and say that Greenplum was brought in by ‘HP’, before quickly correcting himself when prompted. Read more
Categories: Analytic technologies, Business Objects, Data warehouse appliances, Data warehousing, Greenplum, Oracle, Specific users, Telecommunications | Leave a Comment |
Applications for super-low-latency CEP
Complex event/stream processing vendors compete fiercely on the basis of low latency, down to the single-digit number of milliseconds, or even sub-millisecond levels. A question naturally springs to mind: When does this extreme low latency matter?
I think I’ve come up with a concise yet fairly accurate answer: Super-low latency matters when the application includes direct competition against a similarly fast opponent. The best example is automated stock trading – if you can exploit a market inefficiency 1 millisecond before your competition, you make money.
Other examples might arise in network security or battlefield systems, but I don’t know of any specific real-life cases. Instead, other applications for complex event/stream processing tend to be content with latencies that are easier to achieve. E.g., 100 milliseconds (1/10 of second) is likely to be plenty fast enough.