June 20, 2011

Columnar DBMS vendor customer metrics

Last April, I asked some columnar DBMS vendors to share customer metrics. They answered, but it took until now to iron out a couple of details. Overall, the answers are pretty impressive.  Read more

June 14, 2011

Infobright 4.0

Infobright is announcing its 4.0 release, with imminent availability. In marketing and product alike, Infobright is betting the farm on machine-generated data. This hasn’t been Infobright’s strategy from the getgo, but it is these days, with pretty good focus and commitment. While some fraction of Infobright’s customer base is in the Sybase-IQ-like data mart market — and indeed Infobright put out a customer-win press release in that market a few days ago — Infobright’s current customer targets seem to be mainly:

Key aspects of Infobright 4.0 include:  Read more

June 4, 2011

Dirty data, stored dirt cheap

A major driver of Hadoop adoption is the “big bit bucket” use case. Users take a whole lot of data, often machine-generated data in logs of different kinds, and dump it into one place, managed by Hadoop, at open-source pricing. Hadoop hardware doesn’t need to be that costly either. And once you get that data into Hadoop, there are a whole lot of things you can do with it.

Of course, there are various outfits who’d like to sell you not-so-cheap bit buckets. Contending technologies include Hadoop appliances (which I don’t believe in), Splunk (which in many use cases I do), and MarkLogic (ditto, but often the cases are different from Splunk’s). Cloudera and IBM, among other vendors, would also like to sell you some proprietary software to go with your standard Apache Hadoop code.

So the question arises — why would you want to spend serious money to look after your low-value data? The answer, of course, is that maybe your log data isn’t so low-value. Read more

June 1, 2011

The essence of an application

Once upon a time, information technology was strictly about — well, information. And by “information” what was meant was “data”.* An application boiled down to a database design, plus a straightforward user interface, in whatever the best UI technology of the day happened to be. Things rarely worked quite as smoothly as the design-database/press-button/generate-UI propaganda would have one believe, but database design was clearly at the center of application invention.

*Not coincidentally, two of the oldest names for “IT” were data processing and management information systems.

Eventually, there came to be three views of the essence of IT:

Graphical user interfaces were a major enabling technology for that evolution. Equally important, relational databases made some difficult problems easy(ier), freeing application designers to pursue more advanced functionality.

Based on further technical evolution, specifically in analytic and consumer technologies, I think we should now take that list up to five. The new members I propose are:

Read more

April 21, 2011

Application areas for SAS HPA

When I talked with SAS about its forthcoming in-memory parallel SAS HPA offering, we talked briefly about application areas. The three SAS cited were:

Meanwhile, in another interview I heard about, SAS emphasized retailers. Indeed, that’s what spawned my recent post about logistic regression.

The mobile communications one is a bit scary. Your cell phone — and hence your cellular company — know where you are, pretty much from moment to moment. Even without advanced analytic technology applied to it, that’s a pretty direct privacy threat. Throw in some analytics, and your cell company might know, for example, who you hang out with (in person), where you shop, and how those things predict your future behavior. And so the government — or just your employer — might know those things too.

April 19, 2011

Notes on short-request scale-out MySQL

A press person recently asked about:

… start-ups that are building technologies to enable MySQL and other SQL databases to get over some of the problems they have in scaling past a certain size. … I’d like to get a sense as to whether or not the problems are as severe and wide spread as these companies are telling me? If so, why wouldn’t a customer just move to a new database?

While that sounds as if he was asking about scale-out relational DBMS in general, MySQL or otherwise, short-request or analytic, it turned out that he was asking just about short-request scale-out MySQL. My thoughts and comments on that narrower subject include(d) but are not limited to:  Read more

April 8, 2011

Revolution Analytics update

I wasn’t too impressed when I spoke with Revolution Analytics at the time of its relaunch last year. But a conversation Thursday evening was much clearer. And I even learned some cool stuff about general predictive modeling trends (see the bottom of this post).

Revolution Analytics business and business model highlights include:

Read more

February 14, 2011

Some quick notes on HP-Vertica

HP is acquiring Vertica.  Read more

February 8, 2011

Membase and CouchOne merged to form Couchbase

Membase, the company whose product is Membase and whose former company name is Northscale, has merged with CouchOne, the company whose product is CouchDB and whose former name is The result (product and company) will be called Couchbase. CouchDB inventor Damien Katz will join the Membase (now Couchbase) management team as CTO. Couchbase can reasonably be regarded as a document-oriented NoSQL DBMS, a product category I not coincidentally posted about yesterday.

In essence, Couchbase will be CouchDB with scale-out. Alternatively, Couchbase will be Membase with a richer programming interface. The Couchbase sweet spot is likely to be:  Read more

February 1, 2011

Cassandra company DataStax (formerly Riptano) is on track

Riptano, the Cassandra company, has changed its name to DataStax. DataStax has opened headquarters in Burlingame and hired some database-experienced folks – notably Ben Werther from Greenplum and Michael Weir from ParAccel, with Zenobia Godschalk (who worked with Aster Data) somewhere in the outside PR mix. Other than that, what’s new at DataStax is pretty much what could have been expected based on what DataStax folks said last spring.

Most notably, DataStax is introducing a software offering, whose full name is DataStax OpsCenter for Apache Cassandra. DataStax OpsCenter for Apache Cassandra seems to be, in essence, a monitoring tool for Cassandra clusters, with a bit of capacity planning bundled in. (If there are any outright operations parts to DataStax OpsCenter, they got overlooked in our conversation.)* Read more

