The underlying technology of QlikView
QlikTech* finally decided both to become a client and, surely not coincidentally, to give me more technical detail about QlikView than it had when last we talked a couple of years ago. Indeed, I got to spend a couple of hours on the phone not just with Anthony Deighton, but also with QlikTech’s Hakan Wolge, who wrote 70-80% of the code in QlikView 1.0, and remains in effect QlikTech’s chief architect to this day.
*Or, as it now appears to be called, Qlik Technologies.
Let’s start with some quick reminders:
- QlikTech makes QlikView, a widely popular business intelligence (BI) tool suite.
- QlikView is distinguished by the flexibility of navigation through its user interface.
- To support this flexibility, QlikView preloads all data you might want to query into memory.
Let’s also dispose of one confusion right up front, namely QlikTech’s use of the word associative:
- Notwithstanding QlikTech’s repeated use of phrases like “QlikView’s unique, patented in-memory associative technology,” there is nothing “associative” about QlikView’s data structures.
- Rather, “associative” is a term that can reasonably be used to describe the functionality of QlikView’s user interface. In particular, QlikView can “associate” over fields that have the same name, in that it makes it easy for users to join across them.
With that out of the way, let’s turn to some highlights of QlikView’s underlying technology.
- For the most part, QlikView’s in-memory data structures are quite simple. In particular:
- QlikView data is stored in a straightforward tabular format.
- QlikView data is compressed via what QlikTech calls a “symbol table,” but I generally call “dictionary” or “token” compression.
- QlikView typically gets at its data via scans. There is very little in the way of precomputed aggregates, indexes, and the like. Of course, if the selection happens to be in line with the order in which the records are sorted, you can get great selectivity in a scan.
- One advantage of doing token compression is that all the fields in a column wind up being the same length. Thus, QlikView holds its data in nice arrays, so the addresses of individual rows can often be easily calculated.
- To get its UI flexibility, QlikView implicitly assumes a star/snowflake schema. That is, there should be no more and no less than one possible join path between any pair of tables. In some cases, this means one will want to rename fields as part of QlikView load scripts. For example,
- If two keys are meant to be joined on, you might want to give them the same name.
- If two columns have the same name and mean different things (e.g., different kinds of dates), you can give them different names.
- You can mark which columns you do or don’t want to have “qualified” names – i.e., table-specific modifications that force the names to be unique.
- QlikView is designed for gigabytes-scale databases. (More precisely, it’s constrained by how much RAM you can address in a single box, and that’s how the numbers currently work out.) In particular:
- QlikTech recommends 2-4 gigabytes of compressed data per core. QlikTech says 10X is a good rule of thumb for compression, although it sounded like that’s a little (not a lot) on the high side when compared simply to raw data.
- QlikTech further recommends RAM amounting to another 10% of data size be set aside for each concurrent user (e.g., for cache). However, Hakan said that’s really too pessimistic, and in most cases 5% would suffice.
- Bottom line: QlikView “comfortably” handles databases with 10-20 gigabytes of compressed data, at whatever product of record count and record length you like. (E.g., 1 billion relatively narrow records.) That’s on the order of 100 gigabytes of raw data.
- Indeed, several QlikView customers manage several billion records each.
- The main ingredient of the performance secret sauce in QlikView is that selections are compiled straight into machine code. (QlikTech gave me the impression that this post is the first time that will be publicly revealed.) Notes on that include:
- In the old days, QlikTech thought compilation gave a 10X performance benefit vs. interpreted code. However, 5X might be a more up-to-date figure.
- It’s not just code; part of the compilation is to create temporary lookup tables.
- A single calculation can use multiple cores. QlikTech thinks it’s done a very solid job of engineering efficient multicore parallelism. (Note: So far as I could tell, Hakan was using “calculation” to refer both to queries and, well, calculations.)
- There’s a good reason QlikView runs only on Intel-compatible processors. A port would be painful.
- In QlikView’s world, one set of users accesses one set of applications against one database on one machine. However, different subsets (or copies of the same subset) of the same underlying database(s) can of course be run on different machines.
- Naturally, QlikView caches results and tries to re-use them. One smart thing about QlikView’s caching algorithm is that it takes into account the cost of generating the calculated results. This has the happy effect that large result sets, which are often the ones most likely to be useful in a subsequent calculation, are the ones most likely to be retained.
- One thing I unfortunately forgot to ask about is loading QlikView data into memory, something that has at times been problematic.
One last thing: QlikTech is going public. That means there is a QlikTech S-1, from which I learned, among other things, that QlikTech now seems to be called Qlik Technologies. Dave Kellogg offers an outstanding overview of the information in QlikTech’s filing(s). The points I’d add to Dave’s are primarily from the QlikTech balance sheet:
- Deferred revenue, which Dave calls out as high in absolute terms, is also growing faster than revenue (or any major component of revenue).
- Accounts receivable are also growing faster than revenue or any major component thereof.
- One possible explanation is weirdness with international distributors, which is at least potentially consistent with what QlikTech says is a shift in geographical mix.
- Another explanation is increasing deal size/complexity, something that is anyway common among enterprise software companies gaining market share, and that is also consistent with what QlikTech says is a growing fraction of revenue coming from existing customers.
Comments
36 Responses to “The underlying technology of QlikView”
Leave a Reply
I’ve been evaluating QlikView 9 (and other vendors) for the past 8 months (for a BI modernization project at my current employer [in the retail sector]).
Once you “grok” their overall approach, it’s very hard to go back.
The performance and flexibility is very impressive.
While QlikView is more-or-less based on a non-aggregated star schema, you can follow ROLAP principles and achieve more-or-less the same effect, even with drill downs (through document linking).
But I’m most impressed with the company itself. I had a very bad experience with Crystal Reports support after SAP had taken over. I’ve also found many of the big BI vendors (e.g. IBM, Oracle, Microsoft) to be a bit condescending in their sales approach. QlikView however has been very good at listening, and their quality of technical support is superb.
Why does QlikTech cache data when you say it preloads everything into memory? By default, everything is cached.
I was told by the founder of Illuminate that QlikTech runs Illuminate under the covers. The architectures seem to match. But I’ve never been able to verify this.
@Wayne,
Result sets are cached. Having the right data in memory isn’t enough; you also need to identify what actually answers the query, and that takes CPU cycles. Cache the results, and you don’t need to spend the CPU cycles over again.
As for the illuminate claim — that contradicts my understanding of the two technologies. For example, QlikTech has a concept of “record”, and illuminate doesn’t. http://www.dbms2.com/2008/03/27/the-illuminate-guys-have-a-cto-blog/
Perhaps somebody from QlikTech would be kind enough to address that odd assertion.
I would be surprised if I-lluminate’s technology was running underneath (but maybe they’ve licensed it from a patent perspective). It’s my understanding that QlikView uses a columnar data storage model. This is one of the reasons why QV can achieve such exceptional compression and performance. I-lluminate works on a Value Based Store (VBS) model where physically everything maps down to a few columns based on data type (e.g. Date, String, Float, Integer).
That said, I could be mistaken, and QV is possibly using a VBS store. I don’t think the two are mutually exclusive.
One thing that does make me wonder is the fact that QV supports a very quick global search function. This is the biggest advantage of a VBS, since you can physically search against a single column as opposed to having to check multiple columns. So I will be interested to hear what Qlik’s response is.
Neil,
QlikView’s core technology has been developed entirely in house, mainly by myself, at QlikTech and there is no underlying (licensed or not) third party technology. As Curt writes, it is a record based table representation. The use of symbol tables (or dictionaries) per field lends itself very well to different search functions.
[…] read the full story here: The underlying technology of QlikView Share and […]
In principle, In-Memory Database architecture is a very bad approach for designing scalable BI solutions. It was good when there was no real alternative to OLAP’s “all or nothing” approach and QlikView was a good compromise to having a solution that is infinitely scalable. There are technologies much better suited for BI solutions that are intended to grow and stay active for many years without having to double your RAM every 6 months. These technologies typically use In-Memory query processing while the data is physically stored on the hard drive. Amazingly, performance is just as good as in databases that are fully stored in memory.
Examples of companies like this are:
– Vertica (http://www.vertica.com)
– SiSense (http://www.sisense.com)
In addition, since QlikView have filed for IPO and as such were forced to reveal the fact that they made 157m dollars in revenues (2009) while spending 150m, most of which on sales and marketing. This is a sign of a very badly run company, and there’s no wonder why have their exec team was axed.
Dave sounds like a jealous competitor. As I read their S-1, what I want to ask is this: name 5 software companies that have grown at a 59% average annual compound rate over the past five years, are profitable, have received no external venture capital funding since 2004, and grew 66% in Q1 2010 over Q1 last year.
Do you understand how rare it is to grow at that rate, make a profit, and receive no external funding at the same time?
On the technology front, QlikView currently scales to Terabytes (billions of rows) quite easily and with the same application from laptop to thousands of users.
I am an independent consultant and I have had extensive experience with QlikView. The reason I don’t like QlikView very much is due to how they treat their partners, but that’s a completely different story.
The numbers you mentioned would have been impressive if it wasn’t for the extremely poor profitability. Most of my clients make a bigger profit for much less revenue. You can cling to whichever part of the S-1 you want, but the internal executive lay offs at QlikView just prove my point.
QlikView does NOT realistically scale to tera-bytes. It maybe works in QlikView laboratories. It has nothing to do with technology, just reality. The type of hardware you would need to run a tera-byte database in memory while servicing thousands of users is beyond any company’s reach. Stop selling marketing fairy tales. It’s no coincidence why QlikView is having trouble extending their solutions beyond a single department.
QlikView is a good product, but it’s old. For departmental solutions, QlikView is gaining competition every day (In my opinion PowerPivot is going to kill off many of QlikView’s prospects in the upcoming years) and in corporate solutions QlikView fail miserably because their product won’t scale properly without creating dozens of isolate QlikView data marts, which really beats the purpose of having an in memory database.
Cheers.
@Dave,
Perhaps you’re counting wrong. 1 TB of UNCOMPRESSED data is the metric, not compressed. Or are you saying QlikView can’t even handle that on a hefty but not ridiculous SMP box?
As for the layoffs — who in particular are you referring to? It’s not something I’ve tracked in QlikTech’s case.
What I’m saying is that company benchmarks don’t mean anything about real life. I’ve seen QlikView demos running on 1 billion rows, and I’ve seen it choke in real life on a billion rows with 64GB of available RAM. What does that mean? Can it handle 1 billion rows or can’t it?
I’ve also seen promotional videos by Lyza showing analysis over a 100M row table running in seconds on a 32-bit machine with 2 GB of RAM. What does that mean about these technology’s ability to scale to multiple users? Absolutely nothing. And Lyza don’t even use an In-Memory database, only a columnar, disk-based database. They suffer from the same problems QlikView suffer from, without the hefty cost of RAM. Does that mean Lyza’s technology is better than QlikView’s?
Anyone who says certain database technology “currently scales to Terabytes (billions of rows) quite easily and with the same application from laptop to thousands of users” either knows nothing about how BI is implemented in a business OR thinks everybody’s an idiot.
Taking your example –
1TB compressed by a factor of 10 is 100 GB. Setting aside the fact that x10 compression is extremely rare in real life cases, this would still require 100GB in memory to fit.
Each user executing a query over this in-memory data model needs more RAM for intermediate calculations. I believe QlikView assumes about 20% addition per each concurrent user. Let’s say it’s half that – 10%:
5 users would require an additional 50 GB, 10 would require an additional 100 GB, and suddenly 10 users need 200 GB of RAM (twice as initially needed)
Also, let’s say that this company who owns this 1TB database is 10 years old and that the data accumulation is linear (which it never is) – this would mean an additional 100GB each year, meaning doubling of the RAM each year on data storage alone. Ok, so the additional data is compressed, but you get the point.
Not to mention the amount of time it takes to load this data into memory and then compress & persist the data on the hard drive (hours? days?). Power outage? There goes a day’s worth of BI… Screen saver takes more memory than it should? No more memory. Another day’s worth of BI gone. Need to defrag the hard drive? Gotta shut QlikView off – another day of BI wasted…
All this amounts to the fact that no one in his right mind would use this type of technology on these amounts of data and/or amounts of users.
The only way an in-memory database can go around this is to do a lot of work at the data side: cleanse, warehouse, aggregate, split models apart, etc.
Once you have to do that, the difference in implementation from traditional dw/olap architectures and QlikView are not as significant. because you basically need to do the same work and buy the same hardware.
QlikView’s strength lies in the trade-off between number of users and amount of data. If you are a single user, using a desktop computer with 12GB of RAM (max for a non-server machine) you can probably do some really cool things with it over databases up to 15-20GB (this is where PowerPivot will kill QlikView, I believe, because… well… everybody has Excel and it’s cheaper to upgrade).
Concurrent users change the entire mathematics of resource consumption. You could implement 100 users over a QlikView application but you would have to do one or more of the following:
– Buy monster hardware and be ready to upgrade it frequently… $$$
– Create a data warehouse and place QlikView over it… $$$
– Split your models apart, resulting in multiple applications to create and maintain… $$$$
You don’t think QlikView partners live off commissions, do you? they live on the preparation and maintenance fees their customers are forced to pay. This is still a cost-effective alternative to OLAP in small implementations but not really for larger ones. Definitely not for 1TB of data with thousands of users. At these levels, QlikView is just a nice GUI builder. Lots of those are out there…
Dave: “The reason I don’t like QlikView very much is due to how they treat their partners, but that’s a completely different story.”
At least you admit your ultimate agenda that I sensed from the beginning, but really, you should leave it there, rather than say things that aren’t true. But maybe what you’re saying is just out of ignorance. If so, pull up a chair, class is in session.
QlikView can and does handle terabyte (and larger) databases.
The most recent (this month) public example is here:
http://www.qlikview.com/us/company/press-room/press-releases/2010/us/0603-qlikview-helps-california-casualty-improve-sales
And then here’s another public example:
“was the only solution capable of analyzing the required 15 to 20 terabytes of data without the use of a data warehouse….”
http://findarticles.com/p/articles/mi_m0EIN/is_2005_May_9/ai_n13677502/
I personally know a globally known retailer with a Point of Sale application that has a 800 million row fact table with another 300 million rows of dimensions. All in QlikView.
Then there’s the pharmaceutical company with a 3 billion row QlikView application for drug product legal discovery.
I hear you you don’t want QlikView to scale, but alas, it does. Quite nicely.
Joe,
I’m not hiding the fact that I not a big fan, just like you’re not hiding the fact you are one. If this fact de-legitimizes my argument, it does so to yours as well.
Having said that, I believe my earlier reply is relevant to this one. Press releases commissioned by a company don’t impress me. I’ve sent out a few of my own, I know what they’re worth. Truth is very subjective my friend.
Anyway, the fact QV sends out these press releases just makes me understand I am not the first one to notice this.
Anyway, thanks for the conversation. It was enlightening and I sincerely hope you’re getting paid for the PR your doing 😉
Signing off.
I have to respectfully disagree with the QV detractors on this post.
In particular, the short-sightedness of implied implementation a couple of posts up.
The example was given of a corporate database of 1 TB in size that grows linearly at a rate of 100GB per year. Any BI application that blindly copies an entire enterprise database – regardless its in memory, or on disk implementation, is not worth the money invested in it. Furthermore, it’s unreasonable to expect that any business would need 10+ years of row-level historical transactional data readily available in a BI application.
It was also suggested that any kind of service interruption on the server hosting QlikView would put the BI application out of commission for hours or days while the data reloads. Again, this is simply not true. A well designed QlikView application loads high volume data incrementally, and stores pre-processed data externally to the datasource, in a format that is very similar to the one used in memory. Service outages result in BI outages just as long as the server itself is down.
It’s not really fair to blame the failings of a bad implementation on the product itself. You can’t blame the bricks if your house falls down in a storm.
I’ve played on both sides of the fence, and unequivocally the pure actionable insight that users gain from a well designed, and well executed QlikView implementation exceeds by far the value they can derive from canned reporting, or even OLAP solutions.
I’m not saying, by any stretch, that QlikView is the ‘right’ solution for everyone. Nor am I saying that it’s the ‘perfect’ or ‘ideal’ solution. What I am saying is it’s not the clunky contraption that other posters have made it out to be.
[…] The second way to use memory is to gain “processing flexibility” when doing analytics. The idea is to throw your data into memory (however much it fits, of course) without spending much time thinking how to do that or what queries you’ll need to do. Because memory is so fast, most simple queries will be executed at interactive times and also concurrency is handled well. European upstart QlikView exploits this fact to offer a memory-only BI solution which provides simple and fast BI reporting. The downside is its applicability to only 10s of GBs of data as Curt Monash notes. […]
“The Data Blog”:
It is not necessary to load the entire data into memory to achieve this. We achieve the same performance and interactivity by storing data on the hard drive (column-oriented) and loading relevant data into memory on demand.
But the data technology behind a certain BI product is only half of it. In order to truly do this you also need to provide a proper front-end that utilizes this technology well.
Elad
Curious – all this chatter about a tool that seems to be just another, however niche, player trying to sell what Microsoft provides for free if you own SqlServer 2008R2, Sharepoint and Excel 2010. Has no one looked at Microsoft lately? I loaded same data into both Excel 2010 with PowerPivot and Qlickview and I see no significant differentiator between them. That said, slick BI tools are great and needed but without solid data foundation, it’s all a moot point and merely allows customers to get inconsistent data faster.
Nicely put, Gary.
Although I would give credit where credit is due… It’s not QlikView who are trying to sell something Microsoft gives for free, but rather Microsoft trying to compete with QlikView by giving PowerPivot for free 🙂
The BI space is getting interesting!
Dave Roberts, are you the Oracle guy placing Oracle dudes into contracts?
[…] to in-memory business intelligence and some other subjects. We didn’t go into the depth of a similar conversation I had recently with Qlik Technologies, but I found it quite interesting even […]
[…] If you want something a little more in-depth, have a look at Curt Monash’ article “The underlying technology of QlikView” or, if you really want to go in-depth, read the full QlikView patent […]
Happy to hear such heated discussion about Qlikview. There is no tool that does everthing… As with any tool, you have to come up with a strategy of how you are going to use it. Whether you first build the DW and take of data integration issues there before applying Qlikview on top of it or taking care of the data integration within Qlikview, that work has to be done somewhere – there is pro and con of each approach and this should be carefully assessed.
Years ago, I did extensive research (as DW/BI architect) on BI tool that is intuitive and easy to use. I tried various tools and looked at many, and I did not come across one that was as easy and intuitive to use than Qlikview. I think one of the most compelling aspect of the tool is that, in many cases, it enable speople to bypass the typical time and effort required to get set up and be able to “view” the data very quickly. Load the data, view the data, make the decision for next step and go, instead of having to submit request to IT, wait for months, etc.
The technology is interesting but what about the bug “out of memory” in qlikview?
http://www.sisense.com/out-of-memory-qlikview.aspx
I addressed memory limitations at length in my post. And it’s rather misleading to call those a “bug”.
A Conversation with QlikView Architect Håkan Wolgé…
I sat down and talked with Håkan Wolgé, the main architect behind QlikView, while I was in Sweden a few weeks ago. I had two main questions for him about his take on QlikView and the associative experience. Erica Driver: At QlikTech we use the word……
I’m actively looking at Qlikview now. The use case(s) that I see this working for are with mid-sized companies. (by the by have done BI in the Fortune 10 and I can see a lot of use cases that something like Qlikview could address).
What I’ve seen so far makes sense from my experience. It seems to be something a mid-size company can handle and successfully implement.
[…] QlikView and Neo4j both rely on direct addressing. […]
[…] BI. QlikView, SAP HANA, Oracle Exalytics, and Platfora are just four examples of many. But few enterprises will […]
A Conversation with QlikView Architect Håkan Wolgé…
I sat down and talked with Håkan Wolgé, the main architect behind QlikView, while I was in Sweden a few weeks ago. I had two main questions for him about his take on QlikView and the associative experience. Erica Driver: At QlikTech we use the word……
[…] data stores, often memory-centric. Two examples I’ve written about are Platfora and QlikView, but “in-memory BI” goes far beyond those two […]
[…] QlikView’s core behavior — all that associative navigation. […]
[…] of analytic DBMS, but it also arises in analytic stacks such as Platfora, Metamarkets or even QlikView, and also in the challenges of making predictive modeling […]
What’s up, its good paragraph regarding media print, we all
understand media is a great source of data.
my page … atdhe nfl (http://www.atdhe.co.uk)
Hello Every One!
I am not a specialist and I am looking for answers!
Qlikview presents his solution as “associative in memory”, you say here that there is nothing associative.
Following my research I conclude that it is a MOLAP solution in memory. Because data are loaded in memory and thanks to the table association cube are pre-calculed.
Is it right?
I am looking for an expert explanation ^^
I’d say there are a lot of similarities between star schema ROLAP on the one hand and MOLAP on the others. But I’m not aware of Qlik using a MOLAP-oriented data manipulation language.
Russia is increasing its exports
https://global.kao-azot.com/hi/