Parallelization

Analysis of issues in parallel computing, especially parallelized database management. Related subjects include:

July 31, 2010

Teradata, Xkoto Gridscale (RIP), and active-active clustering

Having gotten a number of questions about Teradata’s acquisition of Xkoto, I leaned on Teradata for an update, and eventually connected with Scott Gnau. Takeaways included:

Teradata is discontinuing Xkoto’s existing product Gridscale, which Scott characterized as being too OLTP-focused to be a good fit for Teradata. Teradata hopes and expects that existing Xkoto Gridscale customers won’t renew maintenance. (I’m not sure that they’ll even get the option to do so.)
The point of Teradata’s technology + engineers acquisition of Xkoto is to enhance Teradata’s active-active or multi-active data warehousing capabilities, which it has had in some form for several years.
In particular, Teradata wants to tie together different products in the Teradata product line. (Note: Those typically all run pretty much the same Teradata database management software, except insofar as they might be on different releases.)
Scott rattled off all the plausible areas of enhancement, with multiple phrasings – performance, manageability, ease of use, tools, features, etc.
Teradata plans to have one or two releases based on Xkoto technology in 2011.

Frankly, I’m disappointed at the struggles of clustering efforts such as Xkoto Gridscale or Continuent’s pre-Tungsten products, but if the DBMS vendors meet the same needs themselves, that’s OK too.

The logic behind active-active database implementations actually seems pretty compelling: Read more

Categories: Clustering, Continuent, Data warehousing, Solid-state memory, Teradata, Theory and architecture, Xkoto

9 Comments

July 28, 2010

dbShards — a lot like an MPP OLTP DBMS based on MySQL or PostgreSQL

I talked yesterday w/ Cory Isaacson, who runs CodeFutures, makers of dbShards. dbShards is a software layer that turns an ordinary DBMS (currently MySQL or PostgreSQL) into an MPP shared-nothing ACID-compliant OLTP DBMS. Technical highlights included: Read more

Categories: dbShards and CodeFutures, Facebook, MySQL, OLTP, Parallelization, PostgreSQL

3 Comments

July 23, 2010

Some interesting links

In no particular order: Read more

Categories: Business intelligence, EnterpriseDB and Postgres Plus, Fun stuff, Hadoop, Humor, In-memory DBMS, MapReduce, Memory-centric data management, Open source, Oracle, SAP AG

2 Comments

July 6, 2010

Riptano, and Cassandra adoption

Tonight’s Cassandra technology post got plenty long enough on its own, so I’m separating out business and adoption issues here. For starters, known Cassandra users include:

Facebook, which has said it has 150 or so Cassandra nodes (but see below)
Twitter, which has said it has 45 or so Cassandra nodes
Rackspace, which used to be Jonathan Ellis’ employer, and now is backing Cassandra company Riptano
Digg, which along with Twitter and Rackspace was one of the three major users helping advance the Cassandra project
OpenX, Simple Geo, Digital Reasoning, who Jonathan cited as production users in March
Cloudkick, as noted and linked in my other post
Two customers Riptano named at launch (but I’ve forgotten who they were*)

Fetlife, Meebo, and others seem to at least have a healthy interest in Cassandra, based on their level of involvement in a forthcoming Cassandra Summit. That said, the @Fetlife tweetstream features numerous yelps of pain, and I don’t mean the recreational kind. Read more

Categories: Cassandra, DataStax, Facebook, Market share and customer counts, NoSQL, Open source, Parallelization, Pricing, Specific users

5 Comments

July 6, 2010

Cassandra technical overview

Back in March, I talked with Jonathan Ellis of Rackspace, who runs the Apache Cassandra project. I started drafting a blog post then, but never put it up. Then Jonathan cofounded Riptano, a company to commercialize Cassandra, and so I talked with him again in May. Well, I’m finally finding time to clear my Cassandra/Riptano backlog. I’ll cover the more technical parts below, and the more business- or usage-oriented ones in a companion Cassandra/Riptano post.

Jonathan’s core claims for Cassandra include:

Cassandra is shared-nothing.
Cassandra has good approaches to replication and partitioning, right out of the box.
In particular, Cassandra is good for use cases that distribute a database around the world and want to access it at “local” latencies. (Indeed, Jonathan asserts that non-local replication is a significant non-big-data Cassandra use case.)
Cassandra’s scale-out is application-transparent, unlike sharded MySQL’s.
Cassandra is fast at both appends and range queries, which would be hard to accomplish in a pure key-value store.

In general, Jonathan positions Cassandra as being best-suited to handle a small number of operations at high volume, throughput, and speed. The rest of what you do, as far as he’s concerned, may well belong in a more traditional SQL DBMS. Read more

Categories: Amazon and its cloud, Cassandra, DataStax, Facebook, Google, Log analysis, NoSQL, Open source, Parallelization

4 Comments

June 30, 2010

Cloudera Enterprise and Hadoop evolution

I talked with Cloudera a couple of weeks ago in connection with the impending release of Cloudera Enterprise. I’d say: Read more

Categories: Cloudera, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, eBay, Hadoop, Investment research and trading, MapReduce, Market share and customer counts, Petabyte-scale data management, Pricing, Specific users, Web analytics

7 Comments

June 30, 2010

Details and analysis of the VoltDB argument

Todd Hoff (High Scalability blog) posted a lengthy examination of the case and use cases for VoltDB. That excellent post, in turn, is based on a Mike Stonebraker* webinar for VoltDB, for which the slide deck is happily available. It’s all nicely consistent with what I wrote about VoltDB last month, in connection with its launch. Read more

Categories: In-memory DBMS, Michael Stonebraker, OLTP, Parallelization, Theory and architecture, VoltDB and H-Store

3 Comments

May 25, 2010

VoltDB finally launches

VoltDB is finally launching today. As is common for companies in sectors I write about, VoltDB — or just “Volt” — has discovered the virtues of embargoes that end 12:01 am. Let’s go straight to the technical highlights:

VoltDB is based on the H-Store technology, which I wrote about in February, 2009. Most of what I said about H-Store then applies to VoltDB today.
VoltDB is a no-apologies ACID relational DBMS, which runs entirely in RAM.
VoltDB has rather limited SQL. (One example: VoltDB can’t do SUMs in SQL.) However, VoltDB guy Tim Callaghan (Mark Callaghan’s lesser-known but nonetheless smart brother) asserts that if you code up the missing functionality, it’s almost as fast as if it were present in the DBMS to begin with, because there’s no added I/O from the handoff between the DBMS and the procedural code. (The data’s in RAM one way or the other.)
VoltDB’s Big Conceptual Performance Story is that it does away with most locks, latches, logs, etc., and also most context switching.
In particular, you’re supposed to partition your data and architect your application so that most transactions execute on a single core. When you can do that, you get VoltDB’s performance benefits. To the extent you can’t, you’re in two-phase-commit performance land. (More precisely, you’re doing 2PC for multi-core writes, which is surely a major reason that multi-core reads are a lot faster in VoltDB than multi-core writes.)
VoltDB has a little less than one DBMS thread per core. When the data partitioning works as it should, you execute a complete transaction in that single thread. Poof. No context switching.
A transaction in VoltDB is a Java stored procedure. (The early idea of Ruby on Rails in lieu of the Java/SQL combo didn’t hold up performance-wise.)
Solid-state memory is not a viable alternative to RAM for VoltDB. Too slow.
Instead, VoltDB lets you snapshot data to disk at tunable intervals. “Continuous” is one of the options, wherein a new snapshot starts being made as soon as the last one completes.
In addition, VoltDB will also spool a kind of transaction log to the target of your choice. (Obvious choice: An analytic DBMS such as Vertica, but there’s no such connectivity partnership actually in place at this time.)

Categories: EAI, EII, ETL, ELT, ETLT, Games and virtual worlds, In-memory DBMS, Investment research and trading, Michael Stonebraker, NoSQL, OLTP, Parallelization, Solid-state memory, Telecommunications, Theory and architecture, VoltDB and H-Store

16 Comments

May 23, 2010

Various quick notes

As you might imagine, there are a lot of blog posts I’d like to write I never seem to get around to, or things I’d like to comment on that I don’t want to bother ever writing a full post about. In some cases I just tweet a comment or link and leave it at that.

And it’s not going to get any better. Next week = the oft-postponed elder care trip. Then I’m back for a short week. Then I’m off on my quarterly visit to the SF area. Soon thereafter I’ve have a lot to do in connection with Enzee Universe. And at that point another month will have gone by.

Anyhow: Read more

Categories: Analytic technologies, Business intelligence, Data warehousing, Exadata, GIS and geospatial, Google, IBM and DB2, Netezza, Oracle, Parallelization, SAP AG, SAS Institute

3 Comments

May 15, 2010

Further clarifying in-database MPP SAS

My recent post about SAS’ MPP/in-database efforts was based on a discussion in a shared ride to the airport, and was correspondingly rough. SAS’ Shannon Heath was kind enough to write in with clarifications, and to allow me to post same. Read more

Categories: Aster Data, Netezza, Parallelization, Predictive modeling and advanced analytics, SAS Institute

4 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Parallelization

Teradata, Xkoto Gridscale (RIP), and active-active clustering

dbShards — a lot like an MPP OLTP DBMS based on MySQL or PostgreSQL

Some interesting links

Riptano, and Cassandra adoption

Cassandra technical overview

Cloudera Enterprise and Hadoop evolution

Details and analysis of the VoltDB argument

VoltDB finally launches

Various quick notes

Further clarifying in-database MPP SAS

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin