Parallelization

Analysis of issues in parallel computing, especially parallelized database management. Related subjects include:

May 12, 2010

The Clustrix story

After my recent post, the Clustrix guys raised their hands and briefed me. Takeaways included: Read more

Categories: Application areas, Clustrix, Emulation, transparency, portability, Games and virtual worlds, MySQL, NoSQL, OLTP, Parallelization, Solid-state memory

8 Comments

May 7, 2010

Clarifying the state of MPP in-database SAS

I routinely am briefed way in advance of products’ introductions. For that reason and others, it can be hard for me to keep straight what’s been officially announced, introduced for test, introduced for general availability, vaguely planned for the indefinite future, and so on. Perhaps nothing has confused me more in that regard than the SAS Institute’s multi-year effort to get SAS integrated into various MPP DBMS, specifically Teradata, Netezza Twinfin(i), and Aster Data nCluster.

However, I chatted briefly Thursday with Michelle Wilkie, who is the SAS product manager overseeing all this (and also some other stuff, like SAS running on grids without being integrated into a DBMS). As best I understood, the story is: Read more

Categories: Aster Data, Data warehouse appliances, MapReduce, Netezza, Parallelization, Predictive modeling and advanced analytics, SAS Institute, Specific users, Teradata

11 Comments

May 4, 2010

Clustrix may be doing something interesting

Clustrix launched without briefing me or, at least so far as I can tell, anybody else who knows much about database technology. But Clustrix did post a somewhat crunchy, no-registration-required, white paper. Based on that, I get the impression:

Clustrix is making OLTP DBMS.
The core problem Clustrix tries to solve is scale-out, without necessarily giving up SQL. (I couldn’t immediately tell whether Clustrix supports NoSQL-style key-value interfaces enthusiastically, grudgingly, or not at all.)
Unlike Akiban or VoltDB, Clustrix makes database appliances. The Clustrix software seems to assume a Clustrix appliance.
A key feature of Clustrix’s database appliances is that they rely on solid-state memory. I’m guessing that Clustrix appliances don’t even have disks, or that if they do the disks store some software or something, not actual data. (As previously noted, I agree with Oracle in thinking that much of the progress in database technology this decade will come from proper design for solid-state memory.)
Clustrix talks of things that sound like compiled queries and attempts to avoid locks. However, it doesn’t sound as extreme in these regards as VoltDB.
Clustrix also talks of things that sound like consistent hashing.
The brand name “Sierra” also shows up along with the brand name “Clustrix.”

Categories: Clustrix, Data warehouse appliances, DBMS product categories, NoSQL, Parallelization, Solid-state memory, Storage, Theory and architecture

2 Comments

May 4, 2010

Revolution Analytics seems very confused

Revolution Analytics is a relaunch of a company previously known as REvolution Computing, built around the open source R language. Last week they sent around email claiming they were a new company (false), and asking for briefings in connection with an embargo this morning. I talked to Revolution Analytics yesterday, and they told me the embargo had been moved to Thursday.* However, Revolution apparently neglected to tell the press the same thing, and there’s an article out today — quoting me, because I’d given quotes in line with the original embargo, before I’d had the briefing myself. And what’s all this botched timing about? Mainly, it seems to be for a “statement of direction” about software Revolution Analytics hasn’t actually developed yet.

*More precisely, they spoke as if the embargo had been Thursday all along.

Categories: Investment research and trading, Parallelization, Predictive modeling and advanced analytics, Revolution Analytics, SAS Institute

13 Comments

May 1, 2010

Read-your-writes (RYW), aka immediate, consistency

In which we reveal the fundamental inequality of NoSQL, and why NoSQL folks are so negative about joins.

Discussions of NoSQL design philosophies tend to quickly focus in on the matter of consistency. “Consistency”, however, turns out to be a rather overloaded concept, and confusion often ensues.

In this post I plan to address one essential subject, while ducking various related ones as hard as I can. It’s what Werner Vogel of Amazon called read-your-writes consistency (a term to which I was actually introduced by Justin Sheehy of Basho). It’s either identical or very similar to what is sometimes called immediate consistency, and presumably also to what Amazon has recently called the “read my last write” capability of SimpleDB.

This is something every database-savvy person should know about, but most so far still don’t. I didn’t myself until a few weeks ago.

Considering the many different kinds of consistency outlined in the Werner Vogel link above or in the Wikipedia consistency models article — whose names may not always be used in, er, a wholly consistent manner — I don’t think there’s much benefit to renaming read-your-writes consistency yet again. Rather, let’s just call it RYW consistency, come up with a way to pronounce “RYW”, and have done with it. (I suggest “ree-ooh”, which evokes two syllables from the original phrase. Thoughts?)

Definition: RYW (Read-Your-Writes) consistency is achieved when the system guarantees that, once a record has been updated, any attempt to read the record will return the updated value.

Categories: Amazon and its cloud, NoSQL, OLTP, Parallelization, Theory and architecture

26 Comments

April 27, 2010

Gear6 seems to have failed in the memcached market too

As previously noted, I’ve briefly cut back on blogging (and research) due to some family health issues. The first casualty was a post about memcached. One of the two companies to be featured were my new clients at Northscale. The other was Gear6. What they had in common was:

Both Northscale and Gear6 offered distributions of memcached.
Both Northscale and Gear6 also wanted to sell persistent versions of memcached — in essence, simple DBMS with the memcached API in place of a substantial DML (Data Manipulation Language).

Categories: Clustering, Couchbase, memcached, NoSQL

1 Comment

April 18, 2010

Aster Data’s mapreduce.org site

Aster Data has started a site mapreduce.org, which purports to compile “the best information about MapReduce.” At the moment, mapreduce.org highlights include:

A feed of MapReduce-related posts from several blogs, including this one.
A calendar of MapReduce-related events, not necessarily Aster-specific, integrated with a feed combining …
- … Aster MapReduce-related press releases and also …
- … not necessarily Aster-specific MapReduce-related press articles.
Links to a lot of Aster Data MapReduce-related collateral. Some of that stuff is quite good.*
A sycophantic introduction from Colin White praising the value of the mapreduce.org “independent forum.”

*I did a couple of MapReduce-related webinars for Aster late last year. 🙂 But seriously — Aster does a good job of writing clear and informative collateral.

Categories: Analytic technologies, Aster Data, MapReduce

3 Comments

April 16, 2010

Introduction to Datameer

Elder care issues have flared up with a vengeance, so I’m not going to be blogging much for a while, and surely not at any length. That said, my first post about Datameer was never going to be very long, so lets get right to it:

Datameer offers a business intelligence and analytics stack that runs on any distribution of Hadoop.
Datameer is still building a lot of features that it talks about, for target release in (I think) the fall.
Datameer’s pride and joy is its user interface. Very laudably for a software start-up, Datameer claims to have spent considerable time with professional user interface designers.
Datameer’s core user interface metaphor is formula definition via a spreadsheet.
Datameer includes 124 functions one can use in these formulae, ranging from math stuff to text tokenization.
Datameer does some straight BI, with 4 kinds of “visualization” headed for 20 kinds later. But if you want to do hard-core BI, use Datameer to dump data into an RDBMS and then use the BI tool of your choice. (Datameer’s messaging does tend to obscure or even contradict that point.)
Rather, Datameer seems to be designed for the classic MapReduce use cases of ETL and heavy data crunching.
Datameer’s messaging includes a bit about “Datameer is real-time, even though Hadoop is generally thought of as batch.” So far as I can tell, what that boils down to is …
… Datameer will let you examine sample and/or partial query results before a full Hadoop run is over. Apparently, there are three different ways Datameer lets you do this:
- You can truly query against a sample of the data set.
- You can query against intermediate results, when only some stages of the Hadoop process have already been run.
- You can drill down into a “distributed index,” whatever the heck that means when Datameer says it.
Datameer will let you import data from 15 or so different kinds of sources, SQL, NoSQL, and file system alike.

Categories: Analytic technologies, Business intelligence, Datameer, EAI, EII, ETL, ELT, ETLT, Hadoop, MapReduce

3 Comments

March 23, 2010

Three kinds of software innovation, and whether patents could possibly work for them

In connection with an attempt to articulate my views on software patents (more on those below), I was thinking about the different ways in which software development can be innovative. And it turns out that most forms of software innovation can, at their core, be assigned to one or more of three overlapping categories: Read more

Categories: Analytic technologies, Business intelligence, Cloud computing, Data warehousing, Parallelization, Software as a Service (SaaS), Theory and architecture

5 Comments

March 16, 2010

Memcached-based company NorthScale launches

NorthScale, a start-up based around memcached, has just launched, two weeks after the Todd Hoff’s post arguing the MySQL/memcached combo is passe’. NorthScale wouldn’t necessarily argue with Todd, arguing that what you really should use instead is NorthScale’s combo of memcached and Membase, a memcached-like DBMS …

… or something like that. I don’t intend to write seriously about NorthScale until I have a better idea of what Membase is.

In the mean time,

VentureBeat put up a solid post on NorthScale’s company history and so on
Om Malik bought into the NorthScale memcached pitch
TechCrunch has a low-quality post about NorthScale (although it wasn’t as error-riddled as the same author’s post about nStein, which Seth Grimes properly blasted)

Categories: Cache, Clustering, Couchbase, memcached, NoSQL, Parallelization

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in