The cool aspects of Odiago WibiData
Christophe Bisciglia and Aaron Kimball have a new company.
- It’s called Odiago, and is one of my gratifyingly more numerous tiny clients.
- Odiago’s product line is called WibiData, after the justly popular We Be Sushi restaurants.
- We’ve agreed on a split exclusive de-stealthing launch. You can read about the company/founder/investor stuff on TechCrunch. But this is the place for — well, for the tech crunch.
WibiData is designed for management of, investigative analytics on, and operational analytics on consumer internet data, the main examples of which are web site traffic and personalization and their analogues for games and/or mobile devices. The core WibiData technology, built on HBase and Hadoop,* is a data management and analytic execution layer. That’s where the secret sauce resides. Also included are:
- REST APIs for interactive access.
- Import/export tools, including JDBC access.
- Management tools.
- Analytic libraries — data mining, predictive analytics, machine learning, and so on.
The whole thing is in beta, with about three (paying) beta customers.
*And Avro and so on.
The core ideas of WibiData include:
- ALL data pertaining to a single user (or mobile device) is kept in a single, possibly very long, HBase row.
- There are two primary operators in WibiData, Produce and Gather.
- Produce operates on single rows. It can operate on one row at HBase speed (milliseconds) if you need to inform an interactive user response. Or it can operate on the whole database in batch via Hadoop MapReduce.
- It is reasonable to think of Produce as mainly doing two things. One is the aforementioned serving of data out of WibiData into interactive applications. The other is scoring, classifying, recommending, etc. on individual users (i.e. rows), in line with an analytic model.
- Gather typically operates on all your rows at once, and emits suitable input for a MapReduce Reduce step. It is reasonable to think of Gather as being a key cog in the training of analytic models.
- HBase schema management is done at the WibiData system level, not directly in applications. There’s a WibiData HBase data dictionary, powered by a set of system tables, that specifies cell data types/record types and, in effect, primitive schemas.
WibiData-enhanced HBase differs from relational DBMS in most of the ways you would imagine, both good and bad. In particular:
- Depending on how you look at it, WibiData-enhanced HBase either has no DML (Data Manipulation Language) at all, or else has one that ‘s a lot less rich than SQL.
- WibiData-enhanced HBase schemas are much more dynamic than SQL schemas.
- WibiData-enhanced HBase schemas can have nested or recursive data structures, such as array-valued cells.
To expand on each of those points in turn:
WibiData’s underlying one-giant-table philosophy notwithstanding, there are times you manage multiple tables with it. (For example, you ingest data into WibiData however you can, and then run transformations — typically batch — until the data is in the preferred structure.) While Wibidata does have ways to simulate joins, foreign keys, and so on, there’s nothing resembling referential integrity or foreign key constraints.
WibiData takes single-table schema flexibility to an extreme. Not only can different rows in the same table have different associated columns — something that relational systems can in effect also do via NULL values — but schemas can even change over the life of a column. If you have an array-valued cell storing the results of a marketing campaign, and you start recording more data partway through the campaign, then different rows in the table will, in the same column, hold different-sized arrays.
That nesting can also get pretty serious; where you’d have a single value in a relational table, you might have the equivalent of a whole relational table (or at least selection/view) in WibiData-enhanced HBase. For example, if a user visits the same web page ten times, and each time 50 attributes are recorded (including a timestamp), all 500 data – to use the word “data” in its original “plural of datum” sense – would likely be stored in the same WibiData cell.
That’s about all Odiago is disclosing about WibiData right now. Christophe will also be talking at Hadoop World next week, and presumably can be hit up with any burning questions then.
Comments
14 Responses to “The cool aspects of Odiago WibiData”
Leave a Reply
Regarding your description of our system as designed for “consumer internet data:”
WibiData has enjoyed the most traction to date in the high-tech industry, but works well with any type of user- or customer-centric data: finance, retail, etc.
For more info, come see our Hadoop World talk!
Hadoop county, California, 2011. Gold Rush. Thinking about opening liqueur store and saloon over there.
[…] industry analyst Curt Monash delved into that issue on his DBMS2 blog, explaining how WibiData does what it does. Here’s how Monash describes the […]
[…] Odiago this morning, giving to the business side scoop to TechCrunch and the technical details to Curt Monash. The company is launching a product called Wibidata (“we be data”) specializing in data […]
[…] industry analyst Curt Monash delved into that issue on his DBMS2 blog, explaining how WibiData does what it does. Here’s how Monash describes the […]
I’d be interested to understand how the interactive store compares to Mongo for app development. I’ve been thinking integration between mongo apps and an analytic backend or replica would be attractive.
[…] Read the TechCrunch post here. A more technical description of Odiago’s WibiData offering appears in DBMS2 here. […]
[…] built on HBase and Hadoop,* is a data management and analytic execution layer. More details here. Cancel […]
[…] Curt Monash provides a terrific summary of the technical components of WibiData in his blog DBMS2. LD_AddCustomAttr("AdOpt", "1"); LD_AddCustomAttr("Origin", "other"); […]
[…] The cool aspects of Odiago WibiData […]
[…] Monash, author of DBMS2 also does a great job of summarizing some of the highlights of WibiData, as well as helping clarify how WibiData fits into the taxonomy of investigative, operational, and […]
[…] Monash, author of DBMS2 also does a great job of summarizing some of the highlights of WibiData, as well as helping clarify how WibiData fits into the taxonomy of investigative, operational, and […]
[…] Hbase to analyze consumer web data. Database industry analyst Curt Monash describes WibiData on his DBMS2 blog: WibiData is designed for management of, investigative analytics on, and operational analytics on […]
[…] Spring — running over Hadoop/HBase. Except for some newfound modularity, it is much like what I described at the time of WibiData’s launch or what WibiData further disclosed a few months later. Key aspects […]