iLuminate’s correlation/associative approach to data warehousing
illuminate Solutions (small “i”) is an interesting little company, still rough around the edges. (E.g., the Press Release Archive page at i-lluminate.com says, in its entirety, “We are in the process of loading our historical press releases. Please check back the second week in March!” And I only got that much when I corrected an obvious typo in the URL in the menu bar.) According to CTO Joe Foley, illuminate has 37 or so employees, and 40+ customers, ¾ of whom are in their home country of Spain and ½ the rest of whom are in Latin America. Now they’re entering the US.
illuminate’s basic idea is one I’ve heard before, but mainly from companies with more of a search orientation*, such as Attivio: Take a collection of tables, create a big inverted index on all the values in all columns at once, and do queries on that. This, illuminate claims, obviates all sorts of database design problems and similar hassles you otherwise might have. illuminate’s buzzword for all this is “CDBMS”, where the “C” stands for correlation. The actual CDBMS product is called iLuminate; related business intelligence tools are called iCorrelate and iAnalyze. What iLuminate actually indexes is a token that holds four pieces of information: Instance identifier, table identifier, column identifier, and value.
*iLuminate has string-matching and soundex on character fields, but that’s about as far as it goes.
iLuminate also lets you use standard business intelligence tools via ODBC interfaces. A particularly common front-end for iLuminate appears to be QlikView, which Joe believes doesn’t show good performance on its own if it’s asked to load more than 100 megs or so of data into RAM. I got the impression that illuminate’s own not-SQL-based tools were used mainly for information exploration and discovery, while more repetitive reporting and dashboarding are more usually done via third-party tools. (But that may be somewhat off; we didn’t talk about the point in much detail.)
Where things get really unclear is the database size range that iLuminate is suitable for. Current customers top out at 100 gigs or so, and standard POCs are limited to 25 gigs. On the other hand, illuminate also wants to do “super-size” POCs, and would be happy for those to be in the 5-20 terabyte range. iLuminate isn’t MPP, so a big database would run on a large SMP box. Joe says they’ve tested the product internally up to 1 terabyte or so. A 100 gig database runs happily on a $15K commodity server, but I must confess upon reflection to not being particularly impressed by that fact.
All figures are user data, by the way. Expansion ratio is about 1:1. Values aren’t compressed, but tend to get repeated more in bigger databases. Hence as databases get larger, an increasing fraction of total data volume is in the indexes, and those are smaller than the raw data itself. Joe says query response times are typically 1/10 second or less, unless there are extremely large result sets.
Applications to date seem to span the BI gamut – lots of customer analysis, some product analysis, 2 insurance companies looking at agent payouts, and some random stuff. (E.g., the ability to look anywhere for an odd number of Euros was helpful in one auditing use.) But then, 30 installations in three years in a country the size of Spain isn’t trivial, so one would expect a range of uses.
Comments
2 Responses to “iLuminate’s correlation/associative approach to data warehousing”
Leave a Reply
[…] you want to know more about illuminate’s data warehouse offerings, CTO Joe Foley has a blog. A good starting point might be the post on value-based storage. Two key […]
[…] website and CTO blog that I previously linked both appear to be rather dead sites. Archive.org emphatically […]