Netezza on concurrency and workload management
I visited Netezza Friday for what was mainly an NDA meeting. But while I was there I asked where Netezza stood on concurrency, workload management, and rapid data mart spin-out. Netezza’s claims in those regards turned out to be surprisingly strong.
In the biggest surprise, Netezza claimed at least one customer had >5,000 simultaneous users, and a second had >4,000. Both are household names. Other unspecified Netezza customers apparently also have >1,000 simultaneous users. (Perhaps one is Ross Stores, given how long ago it was said to be in the many 100s, but I didn’t think to ask.) I did not probe as to how demanding a typical user was, so these numbers may not really indicate what they appear to, but anyhow they’re vastly bigger than what I’ve heard from any analytic DBMS vendor newer than Netezza.
On the data mart spin-out side, another household-name Netezza customer has been rapidly spinning out virtual data marts, in a manner somewhat akin to eBay’s virtual data mart/”analytics-as-a-service” strategy* since 2004. However, the whole thing isn’t necessarily as slick as what eBay has going. This Netezza customer’s virtual data marts are more in the way of trials, with those data marts that prove really useful eventually getting instantiated physically on separate Netezza equipment.
*Actually, it’s not just eBay. Teradata told me earlier this week that a large fraction of its high-end customers spin out virtual data marts.
Both of these factoids lead naturally to questions along the line of “Oh really? Well, what have you got in workload management?” It turns out that Netezza has 3 layers of workload management:
- Things (Queries? Workloads? Users?) can be labeled as high/medium/low priority
- Beyond their priority level, things can get guaranteed resource allocation – i.e., an assured minimum share of disk (for temp space?), CPU, etc.
- Netezza software has a “short query bias” — i.e., shorter queries generally get higher priority.
Netezza further says that it’s working to enhance its workload management tools significantly.
Of the top of my head, I don’t recall how much workload management Teradata includes in its non-55xx products, which are the ones — especially the Teradata 2550 — Teradata positions as comparable to Netezza’s 10-xxx series.
Comments
13 Responses to “Netezza on concurrency and workload management”
Leave a Reply
I think it is worth clarifying that concurrent users is not the same as concurrent executing queries. Supporting thousands of connections to a database is not nearly as impressive as supporting thousands of executing, in-flight, queries.
Absolutely true.
On the other hand, it’s pretty hard to get live-customer metrics on concurrent queries, and it’s also hard to think of a lot of use cases where that capability is needed. What’s more, the slower a system the more queries it may have to do at once. 😉
These high concurrent user/login numbers are actually a bit disturbing. OLTP database folks figured this out years ago by leveraging connection pooling. Why would there be so many concurrent users/logins on a DW system? Is this the result of some very bad custom programming with no middle tier connection multiplexing? I find it difficult to believe that any enterprise BI tool would require a 1:1 ratio of users to connections.
Greg,
I’m sorry. Who said anything about concurrent log-ins and the like? I’m referring to the number of human beings who have access to analytics against the same database at the same time.
@Curt
Based on your most recent comment is seems though I was mislead by your (or Netezza’s) wording of “simultaneous users”. Given your clarification, “simultaneous users” is probably not the appropriate terminology. I would say “named users” would be much more appropriate as simultaneous users insinuates users are simultaneous doing something, thus the explanation of my comments. As you mentioned, what is meant is how many users have access to the platform, not how many users are actively using it at any one point in time.
Greg,
One of the users they gave me was a US-only company. So I think pretty much all the named users are probably at their desks at the same time. 🙂
But yes, we’re talking about named users.
In my opinion a question of ‘how many simultaneous queries are running’ is a silly question anyway. As Curt implied in his reply to his own post, a large query backlog just indicates the system is not performing fast enough to keep up with the query workload.
The number of “concurrent users” which can be supported is a matter of how many queries can be executed in a certain time period, not how many queries can be executed at the same time.
A properly designed MPP database should be able to give close to 100% of the system resources to a query if it is running by itself, 50% each if 2 queries are running, etc. A workload management system has to exist primarily to keep the big analytic queries from getting in the way of the small/fast queries. In the end the system should have enough performance so that very little query backlog ever exists.
Claims of being able to run 100s of queries at the same time is a straw man because it misses the point entirely. I don’t care if my data warehouse only runs 1 query at a time. As long as it manages to complete all of the queries fast enough to satisfy all of the SLAs everyone will be happy.
The Netezza customer which has the very high number of concurrent users is Corporate Express. I remembered a press release about it and looked it up.
http://www.netezza.com/releases/2007/release073107.htm
Corporate Express was neither of the two names I got, unless there’s a subsidiary-of relationship that would surprise me.
Curt,
you categorized Analytics technology into the following groups, what is your reasoning behind that?
Analytics technology
* Business intelligence
* Data mart outsourcing
* Data warehousing
* MOLAP
Should not BI be the umbrella name? I feel you need to provide some explanation to avoid any misunderstanding i.e.,
Business intelligence
* Analytics
* Data mart outsourcing
* Data warehousing
* MOLAP
Sean,
I don’t think database management is a subset of “business intelligence”. And while you’re probably not alone in disagreeing with me, I think you’re in a small minority. So I’m not terribly concerned about confusion on that score.
Anyhow, blog category names are hardly meant to comprise a precise or complete industry taxonomy.
[…] with its workload management capabilities for queries, but nonetheless keeps adding features. Workload management has not yet been extended to cover all the non-query parts of the analytic […]
[…] Enterprise data warehouse (EDW) for medium-sized enterprises. (E.g. — I think — Ross Stores.) […]