Endeca topics
I visited my then-clients at Endeca in January. We focused on underpinnings (and strategic counsel) more than on coolness in what the product actually does. But going over my notes I think there’s enough to write up now.
Before saying much else about Endeca, there’s one confusion to dispose of: What’s the relationship between Endeca’s efforts in e-commerce (helping shoppers navigate websites) and business intelligence (helping people navigate their own data)? As Endeca tells it:
- Endeca’s e-commerce and business intelligence efforts are reflections of the same technical approach. Indeed, I’m pretty sure Endeca’s product lines still share much/most of the same technology.
- Endeca went after e-commerce first because that’s where the provable ROI was. As I pointed out a couple of times in 2007, Endeca became a market leader in that area.
- Endeca increased its BI efforts later.
- Circa 2009-10, Endeca differentiated its e-commerce and BI product lines from each other.
- An e-commerce line extension called Page Builder is what really got Endeca through the recent recession.
- The BI product line Latitude was launched in the fall of 2010.
Endeca’s positioning in the business intelligence market boils down to “investigative analytics for people who aren’t hardcore analysts.” Endeca’s technological support for that stresses:
- Faceted search and navigation …
- … against diverse sources of data.
Here “diverse sources of data” can mean two things:
- Tabular data with all sorts of schemas.
- Text and so on.
That said, the Endeca paradigm is really to help you make your way through a structured database, where different portions of the database have different structures. Thus, at various points in your journey, it automagically provides you a list of choices as to where you could go next.
Underneath Endeca’s visible products is an engine called MDEX, about which Endeca says:
Inside the MDEX Engine there is no overarching schema; each data record carries its own metadata. This enables the rapid combination of a wide range of structured and unstructured content into Latitude’s unified data model. Once inside, the MDEX Engine derives common dimensions and metrics from the available metadata, instantly exposing each for high-performance refinement and analysis in the Discovery Framework. Have a new data source? Simply add it and the MDEX Engine will create new relationships where possible. Changes in source data schema? No problem, adjustments on the fly are easy.
While that is rather QlikView-like in its goals, the details are different. Most notably, Endeca MDEX features a disk-based columnar DBMS, whose highlights include:
- Endeca MDEX stores each column in two sort orders — by value and by a universal record ID. As I noted previously, this is a nice approximation to column-store idealism.
- Endeca MDEX has a range of columnar compression options — dictionary/token encoding (I’m pretty sure), run-length encoding, prefix compression, etc.
- Every Endeca MDEX column has a small tree-structured index cached in RAM.
On the business intelligence market penetration side, Endeca talked mainly about:
- Automotive
- Other manufactured equipment
- Public sector/intelligence and law enforcement
- Consumer & packaged goods
- Financial service
Specific applications mentioned included:
- Part-finding (I think by engineers)
- Sales/promotion analysis (down on the dealer level)
- Demand planning variance analysis
- Direct spend analysis
- Quality
- Consultant staffing
Comments
11 Responses to “Endeca topics”
Leave a Reply
Their statement about “no overarching schema; each data record carries its own metadata”, “derives common dimensions and metrics from the available metadata”, and “[the] MDEX Engine will create new relationships where possible” makes this sound kind of like illuminate. Could you highlight some of the differences?
One difference is that Endeca seems to have a straightforward columnar DBMS and illuminate doesn’t.
Another obvious difference is that what’s secondary to Endeca — data management — is primary to illuminate.
Yet another difference is that Endeca is a company with heft, while illuminate doesn’t seem to have much mass outside Spain.
A couple of questions: (1) regarding the commentary of the relationship between the e-commerce and BI product lines, is there really any new technology that is associated with the Latitude launch? and (2) does all data need to be persisted (stored) in the MDEX rdbms to be available by the Discovery Framework or can external data be referenced?
Alan,
1) I’m pretty sure yes, but I lack details. (The post focused on the lower part of the stack for a reason. :D)
2) I think they move the data into their own engine, but somebody from Endeca might prefer to (dis)confirm that directly.
Good post, Curt. Agreed on the column store idealism observation as a good characterization of our approach, where performance is critical for our use cases, but simplicity of deployment (i.e., not having to invest a great deal of effort configuring how storage and indexing will work to get good performance) is also essential given our emphasis on agile deployment.
Following up on Alan’s thread…
On (1), our two product lines are based on the same core MDEX engine technology, which is where the majority of our IP resides. And the products share some additional common components such as our Content Acquisition System for crawling unstructured data, extracting structure from it, and loading it into the MDEX engine. But the tooling around the core is different in either case. For example, the Latitude product comes with a complete component based UI framework for analytic applications (the Discovery Framework mentioned in the post), which includes a suite of out-of-the-box MDEX-powered UI components that draw on many of the user experience lessons we’ve learned (and continue to learn) from powering so many big name web sites. Meanwhile, our eBusiness product includes site management tools, e.g., for controlling site structure such as search-triggered landing pages, and for managing SEO on dynamic search and navigation pages.
On (2), we do pull into the MDEX any data needed to power the analytics, data navigation, and search experience in the Discovery Framework. This is essential to our approach for delivering interactive speed exploration of semi-structured data, which relies on having the data stored in a compressed columnar structure with strong memory caching (I talk about this here: http://facets.endeca.com/2010/10/in-memory-but-not-memory-bound/). That said, we don’t pull in auxiliary data that isn’t needed for analytic computations. For example, if there are images associated with the data records in the MDEX, we’ll simply store a reference to the file (usually just a URL) rather than the actual image data.
Is there still the concept of “baseline” vs “partial” updates to the MDEX engine in Endeca? As I remember hearing, 6.0+ was going to make all updates “partial” (not fully re-indexing data). This was a critical need for a few projects I’ve dealt with in the past. Great article, btw!
[…] is buying Endeca. The official talking points for the deal aren’t a perfect match for Endeca’s actual technology, but so be […]
[…] good background reading on Endeca from Curt Monash, Forrester’s Leslie Owens and Boris Evelson, Derrick Harris and BI Scorecard’s Cindi […]
[…] Curt Monash on Endeca’s product and MDEX technology […]
[…] advance. They don’t even want fixed drilldown paths smartly calculated on the fly, ala’ Endeca (which, after all, ultimately didn’t succeed). Rather, they want to be able to truly choose […]
[…] Endeca was another BI vendor whose UI differentiation was based on a proprietary DBMS-like engine. […]