Application areas
Posts focusing on the use of database and analytic technologies in specific application domains. Related subjects include:
- Any subcategory
- (in Text Technologies) Specific application areas for text analytics
Cloudera Enterprise and Hadoop evolution
I talked with Cloudera a couple of weeks ago in connection with the impending release of Cloudera Enterprise. I’d say: Read more
The most important part of the “social graph” is neither social nor a graph
“Social graph” is a highly misleading term, and so is “social network analysis.” By this I mean:
There’s something akin to “social graphs” and “social network analysis” that is more or less worthy of all the current hype – but graphs and network analysis are only a minor part of the whole story.
In particular, the most important parts of the Facebook “social graph” are neither social nor a graph. Rather, what’s really important is an aggregate Profile of Revealed Preferences, of which person-to-person connections or other things best modeled by a graph play only a small part.
| Categories: Analytic technologies, Facebook, Games and virtual worlds, RDF and graphs, Surveillance and privacy, Web analytics | 13 Comments | 
VoltDB finally launches
VoltDB is finally launching today. As is common for companies in sectors I write about, VoltDB — or just “Volt” — has discovered the virtues of embargoes that end 12:01 am. Let’s go straight to the technical highlights:
- VoltDB is based on the H-Store technology, which I wrote about in February, 2009. Most of what I said about H-Store then applies to VoltDB today.
- VoltDB is a no-apologies ACID relational DBMS, which runs entirely in RAM.
- VoltDB has rather limited SQL. (One example: VoltDB can’t do SUMs in SQL.) However, VoltDB guy Tim Callaghan (Mark Callaghan’s lesser-known but nonetheless smart brother) asserts that if you code up the missing functionality, it’s almost as fast as if it were present in the DBMS to begin with, because there’s no added I/O from the handoff between the DBMS and the procedural code. (The data’s in RAM one way or the other.)
- VoltDB’s Big Conceptual Performance Story is that it does away with most locks, latches, logs, etc., and also most context switching.
- In particular, you’re supposed to partition your data and architect your application so that most transactions execute on a single core. When you can do that, you get VoltDB’s performance benefits. To the extent you can’t, you’re in two-phase-commit performance land. (More precisely, you’re doing 2PC for multi-core writes, which is surely a major reason that multi-core reads are a lot faster in VoltDB than multi-core writes.)
- VoltDB has a little less than one DBMS thread per core. When the data partitioning works as it should, you execute a complete transaction in that single thread. Poof. No context switching.
- A transaction in VoltDB is a Java stored procedure. (The early idea of Ruby on Rails in lieu of the Java/SQL combo didn’t hold up performance-wise.)
- Solid-state memory is not a viable alternative to RAM for VoltDB. Too slow.
- Instead, VoltDB lets you snapshot data to disk at tunable intervals. “Continuous” is one of the options, wherein a new snapshot starts being made as soon as the last one completes.
- In addition, VoltDB will also spool a kind of transaction log to the target of your choice. (Obvious choice: An analytic DBMS such as Vertica, but there’s no such connectivity partnership actually in place at this time.)
More on Sybase IQ, including Version 15.2
Back in March, Sybase was kind enough to give me permission to post a slide deck about Sybase IQ. Well, I’m finally getting around to doing so. Highlights include but are not limited to:
- Slide 2 has some market success figures and so on. (>3100 copies at >1800 users, >200 sales last year)
- Slides 6-11 give more detail on Sybase’s indexing and data access methods than I put into my recent technical basics of Sybase IQ post.
- Slide 16 reminds us that in-database data mining is quite competitive with what SAS has actually delivered with its DBMS partners, even if it doesn’t have the nice architectural approach of Aster or Netezza. (I.e., Sybase IQ’s more-than-SQL advanced analytics story relies on C++ UDFs — User Defined Functions — running in-process with the DBMS.) In particular, there’s a data mining/predictive analytics library — modeling and scoring both — licensed from a small third party.
- A number of the other later slides also have quite a bit of technical crunch. (More on some of those points below too.)
Sybase IQ may have a bit of a funky architecture (e.g., no MPP), but the age of the product and the substantial revenue it generates have allowed Sybase to put in a bunch of product features that newer vendors haven’t gotten around to yet.
More recently, Sybase volunteered permission for me to preannounce Sybase IQ Version 15.2 by a few days (it’s scheduled to come out this week). Read more
Notes on SciDB and scientific data management
I firmly believe that, as a community, we should look for ways to support scientific data management and related analytics. That’s why, for example, I went to XLDB3 in Lyon, France at my own expense. Eight months ago, I wrote about issues in scientific data management. Here’s some of what has transpired since then.
The main new activity I know of has been in the open source SciDB project. Read more
| Categories: Analytic technologies, Data warehousing, eBay, GIS and geospatial, Microsoft and SQL*Server, SciDB, Scientific research, Web analytics | 5 Comments | 
The Clustrix story
After my recent post, the Clustrix guys raised their hands and briefed me. Takeaways included: Read more
| Categories: Application areas, Clustrix, Emulation, transparency, portability, Games and virtual worlds, MySQL, NoSQL, OLTP, Parallelization, Solid-state memory | 8 Comments | 
Truviso evidently reinvents itself
When Aleri bought Coral8 last year, I wrote that the independent CEP (Complex Event Processing) vendors were floundering. Aleri quickly threw in the towel and sold out to Sybase, which hardly changed my opinion. StreamBase actually is persevering, but not with any kind of breakout success. Big vendors, such as Microsoft and IBM, have at least some aspirations of eventually filling the gap.
Meanwhile, Truviso — which never got much market traction in the first place — was in hiding; Roman Bukary never did keep his promise to brief me on the company’s new and improved strategy. Then Truviso had yet another management change, amidst rumors that it was repositioning away from CEP. As per a press release Truviso emailed today, that’s now official, with Truviso’s main business being something to do with web analytics.
Edit: It seems Truviso was at some point absorbed into Cisco.
Revolution Analytics seems very confused
Revolution Analytics is a relaunch of a company previously known as REvolution Computing, built around the open source R language. Last week they sent around email claiming they were a new company (false), and asking for briefings in connection with an embargo this morning. I talked to Revolution Analytics yesterday, and they told me the embargo had been moved to Thursday.* However, Revolution apparently neglected to tell the press the same thing, and there’s an article out today — quoting me, because I’d given quotes in line with the original embargo, before I’d had the briefing myself. And what’s all this botched timing about? Mainly, it seems to be for a “statement of direction” about software Revolution Analytics hasn’t actually developed yet.
*More precisely, they spoke as if the embargo had been Thursday all along.
| Categories: Investment research and trading, Parallelization, Predictive modeling and advanced analytics, Revolution Analytics, SAS Institute | 13 Comments | 
Vertica update
Last month, Vertica’s CEO Ralph Breslauer quit,* and Vertica made it sound like there would be a new CEO late in April. And indeed, as of April 29, there was. He’s a guy I’ve never heard of before named Chris Lynch, apparently quite the sales machine builder. The most substance I’ve found is a pair of Mass High Tech articles — the latter exceedingly typo-ridden — to the general effect that:
- Vertica plans to build a massive, world-conquering sales force.
- If Vertica dips back into negative cash flow to do that and has to raise more venture capital, so be it.
- “Triple-digit” revenue growth is expected for this year.
Greenplum Chorus and Greenplum 4.0
Greenplum is making two product announcements this morning. Greenplum 4.0 is a revision of the core Greenplum database technology. In addition, Greenplum is announcing Greenplum Chorus, which is the first product release instantiating last year’s EDC (Enterprise Data Cloud) vision statement and marketing campaign.
Greenplum 4.0 highlights and related observations include: Read more
