January 26, 2008
Kognitio WX2 overview
I had a call today with Kognitio execs Paul Groom and John Thompson. Hopefully I can now clear up some confusion that was created in this comment thread. (Most of what I wrote about Kognitio in October, 2006 still applies.) Here are some highlights.
- With one exception, Kognitio WX2 gets data on and off disk in the simplest possible way. Data goes on via round-robin partitioning. It comes off via table scans (with no indexes at all).
- The exception is that WX2 lets you put local bitmapped indexes on each disk, which Kognitio thinks are useful for cardinalities up into the 1000s. These indexes record which data values appear anywhere in a block, so the system knows which blocks it needs to scan. The benefit is similar to that of a Netezza zone map, but less, in that the Kognitio bitmap only works well for equalities and not ranges.
- The bitmaps are compressed. Otherwise, Kognitio uses no compression. They tried compression in the past and it didn’t go well.
- Where Kognitio gets fancy is in RAM. WX2 can have tables or views in memory that are kept synced up with disk. (Or even run purely in memory, an extreme transience that seems useful mainly for ELT.) These tables can be replicated, hashed, or whatever as makes sense.
- The biggest (measured by data) WX2 customer has bought a license for 9 ½ terabytes of user data. Kognitio expresses optimism about competing in the 10s of terabytes range, and thinks its technology actually scales up to the 100s of terabytes.
- Typical Kognitio WX2 configurations have a couple of hundred gigs of user data per CPU core. The biggest current system measured by nodes has 300 servers. (A past system had 900.) If you multiply that out it would seem there’s an extra zero, so I presume the servers in question are particularly small and well-aged.
- Kognitio stresses that WX2 runs on a broad variety of systems, just so long as the chips are x86 and the operating system is one of Kognitio’s preferred flavors of Linux. Blades, SMP nodes — WX2 doesn’t care. The nodes can even have heterogeneous hardware, although that’s sub-optimal since system performance is gated by the least powerful node. Kognitio seems to think that the cutoff for where bigger boxes are better than blades is probably in the 30-50 terabyte range, although as noted above that’s mainly a theoretical point right now.
- Kognitio also stresses a diversity of deployment models. WX2 runs on server farms in the cloud. You can install it on hardware of your choice. Or Kognitio will build a turnkey system for you, out of the brand of hardware of your choice.
- WX2 running over solid-state disks is likely not in the cards. Due to Kognitio’s lack of compression, this would be a very expensive solution.
- Kognitio is proud of its “plug-ins,” which amount to user-defined functions. There’s one on the price list for telecom call repricing that sounds a lot like what one member of the Netezza Developer Network is doing. There’s also a set of half a dozen for astronomical research, which I’m hoping somebody from Kognitio will describe in an email I can post.
- Kognitio’s sales have been focused in the UK. There’s one US customer, whose name I forget. John Thompson has been hired back into the company to expand US operations, but that seems to be waiting for a VC round to complete. Large hardware companies and systems integrators seem to play a big part in Kognitio’s distribution strategy.
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Kognitio
Subscribe to our complete feed!
Comments
12 Responses to “Kognitio WX2 overview”
Leave a Reply
Curt,
small addendum… the disk local bit mapped indexes will work for range predicates (i.e. between clause) if the column data type is numeric.
Paul Groom
Director, Business Intelligence
Kognitio
Thanks, Paul!
Please tell me that means dates too. 🙂
CAM
Hi Curt,
Thanks for the time last week. The US customer is Calit2 and the project that they use WX2 as on on-premise software environment
for is referred to as CAMERA.
I forgot to ask on our call; will you be at TDWI in Las Vegas? If you are, please come to our launch party on Monday night.
The theme is the British Invasion. No slideware or pitches, just drinks and entertainment. I hope that you can come.
Best,
John
847 251 5305
John,
So far, I have a perfect track record of not attending TDWI. At some point it may get easier to just go, rather than explain to a zillion clients why I’m NOT going, but I do think I’ll miss it again this time.
Best,
CAM
Hi:
I had the oppertunity to play with WX2, with a su GB and 100 GB warehouse. I did not run any fancy benchmarks, but cross joins ran 10 – 30 x faster compared to oracle, and the self join was about 35 x faster than oracle running on a comparable hardware. I am pretty impressed by there performance. Wondering if any of you have any comparative benchmark data of WX2 against DB2 or Teradata.
Thanks
Aniruddha
Database Operations MicroStrategy
Hi,
Benchmarks are going to vary very widely according to the exact database and queries you are using. 10-200x vs. Oracle is common for any of these technologies, at least among the most favorable parts of the test.
DB2 seems to be subject to similar I/O bottlenecks than Oracle; i.e., the shared-nothing architecture isn’t as shared-nothing as I once thought.
Teradata will typically show excellent performance too. The main issue with them is commonly price. In addition, for certain workloads, a good columnar system will blow away anything row-based. And almost every system has some queries or workloads at which it outshines other generally comparable products.
A key feature of Microstrategy workloads is large intermediate result sets. I’d expect the vendors who’ve focused on inter-node communication to do particularly well on those.
CAM
Hi Curt:
It looks like that WX2 and other emerging DW DBMS/APPL vendors are targetting Oracle as 1) Oracle installations are easy to find, and 2) they suffer from performance degradation once data size goes beyond I would say 5 TB, and even with partitioning, they just cannot keep up.
Regarding large intermediate result set, we are in majority of the cases try to create the result set in the memory by creating derived tables(or common table expression from db2 world), and that way we are reducing the write. However there are always cases where we cannot get the final result set out by one big query, and in those cases we fall back to intermediate tables.
There is one word of caution for WX2, and we have learned in the hard way! If you create a true table and remove it from the disk, it leaves a hole in their data file(They start writing from the end of the disk, and try to put a table in a continuous manner to reduce the disk head movement). And feom time to time we need to lock the database and reclaim the space. Database performance detoriorates as the number of holes grow, and at some point wx2 stops creating any table.
Due to this reason, we are recommending our customers to use global temp tables (which are memory only) instead of tru tables.
In terms of maturity of the product and ease of operations, I myself would like to place them above GreenPlum and Vertica.
Paraccel is another one I am talking with, but have not got hold of their software.
And Aurt, this is the only website I have found so far in last eight years which deals with the critical aspects of different DBMSs. Keep up the good work.
Thanks
Aniruddha MItra
Database Operations, MicroStrategy
Aniruddha,
Thanks for the kind words. And thanks even more for the detailed information!
Do you actually recommend WX-2 to your customers, or are you just helping out the one who buy it on their own?
Also, have you looked at DATAllegro? When they improved their capability to handle intermediate result sets a version or so ago, they explicitly mentioned Microstrategy as an example of software that would benefit.
(And if you’d like to talk more privately, that’s great. There’s a contact page on http://www.monash.com that says how to reach me.)
Best,
CAM
Curt:
Our position to customers always is database selection is their choice and whatever you have we will support it provided that there is an odbc driver. I go to customer sites when our consultants and sales engineers take me to talk with
customers on slow database performance issues, I was asked many times about “What is your recommendation?”
My answer always outside the official discussions (and my personal openion, noy MicroStrategy’s), why invest on something that does not scale, overpriced, need consultants on site? I am working with Oracle from V7, and now I do not believe that it is worth for DW.
I am very much pro wx2, and planning to move our stress testing infrastructure to wx2(Not completely), and
the main reason is savings on hardware, – what 2 Oracle servers can do 1 wx2 can do very easily.
We support DataAllegro. However they refused to provide us with a hardware, and whatever testing we have done so far is by connecting to a remote machine. We did a lot of functional testing, but real stress testing with at least 100
GB of data was not possible. Our program managers are in constant touch with DA, but I personally do not know much abouyt where they are headed to.
I used to be a Teradata fan, and then a Netezza fan, but if you can find one vendor that would suffer from advancement in
hardware technology, – that would be netezza. They are very primarily read intensive, and they probably read more than any other RDBMS patforms. What happened in last 3 years that disk capacity has gone up 6 times, and hence their
parallelism has dropped 6 times for the same volume of data. What 6 SPAs used to hold a few years ago,
1 SPA can do it now(In terms of capacity), and the disk rpm has remained almost the same. Their only way out of this situation is somehow reducing the read time by compressing the table, and they are trying to do that.
Thanks
Hi Curt.
I am trying to find out if Kognitio is public company or privately held. Also, if private, are they VC backed? Just can’t find any information on this online. What are their expansion plans with the Daas in the US.
Thanks
Ali.
Ali,
Kognitio is private and well-known to be in the process of trying to raise a round of venture capital to help fund their expansion in the US. But they do already have a presence over here.
Best,
CAM
[…] its long history of selling disk-based DBMS and denigrating memory-only configurations, Kognitio now says that in fact it’s always been an in-memory DBMS […]