Clustering

Analysis of products and issues in database clustering. Relates subjects include:

Continuent

March 16, 2010

Memcached-based company NorthScale launches

NorthScale, a start-up based around memcached, has just launched, two weeks after the Todd Hoff’s post arguing the MySQL/memcached combo is passe’. NorthScale wouldn’t necessarily argue with Todd, arguing that what you really should use instead is NorthScale’s combo of memcached and Membase, a memcached-like DBMS …

… or something like that. I don’t intend to write seriously about NorthScale until I have a better idea of what Membase is.

In the mean time,

VentureBeat put up a solid post on NorthScale’s company history and so on
Om Malik bought into the NorthScale memcached pitch
TechCrunch has a low-quality post about NorthScale (although it wasn’t as error-riddled as the same author’s post about nStein, which Seth Grimes properly blasted)

Categories: Cache, Clustering, Couchbase, memcached, NoSQL, Parallelization

Boston Big Data Summit keynote outline

Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I’m posting them below.

Categories: Analytic technologies, Archiving and information preservation, Business intelligence, Cloud computing, Clustering, Columnar database management, Data warehouse appliances, Data warehousing, DBMS product categories, Humor, Investment research and trading, Log analysis, MapReduce, Market share and customer counts, NoSQL, OLTP, Open source, Parallelization, Presentations, Pricing, Solid-state memory, Storage, Telecommunications, Theory and architecture, Web analytics

6 Comments

October 3, 2009

Martin Kersten on issues in scientific data management

Martin Kersten emailed a response to my post on issues in scientific data management. With his permission, I’ve lightly edited it, and am posting it below. Read more

Categories: Analytic technologies, Clustering, Parallelization, SciDB, Scientific research

3 Comments

September 11, 2009

Xkoto Gridscale highlights

I talked yesterday with cofounders Albert Lee and Ariff Kassam of Xkoto. Highlights included: Read more

Categories: Clustering, IBM and DB2, Market share and customer counts, Microsoft and SQL*Server, Parallelization, Pricing, Xkoto

15 Comments

September 3, 2009

Continuent on clustering

Robert Hodges, CTO of my client Continuent, put up a blog post laying out his and Continuent’s views on database clustering. Continuent offers Tungsten, its third try at database clustering technology, targeted at MySQL, PostgreSQL, and perhaps Oracle. Unlike Continuent’s more ambitious. second-generation product, Tungsten offers single-master replication, which in Robert’s view allows for great ease of deployment and administration (he likes the phrase “bone-simple”).

The downside to Continuent Tungsten ‘s stripped down architecture is that it doesn’t solve the most extreme performance scale-out problems. Instead, Continuent focuses on the other big benefits of keeping your data in more than one place, namely high availability and data loss prevention (i.e., backup).

Continuent has been around for a number of years, starting out in Finland but now being based in Silicon Valley. For most purposes, however, it’s reasonable to think of Continuent and Tungsten as start-up efforts.

As you might guess from the references to Finland and MySQL, Continuent’s products are open source, or at least have open source versions. I’m still a little fuzzy as to which features are open sourced and which are not. For that matter, I’m still unclear as to Tungsten’s feature list overall …

Categories: Clustering, Continuent, MySQL, Open source, PostgreSQL

3 Comments

July 29, 2009

What are the best choices for scaling Postgres?

March, 2011 edit: In its quaintness, this post is a reminder of just how fast Short Request Processing DBMS technology has been moving ahead. If I had to do it all over again, I’d suggest they use one of the high-performance MySQL options like dbShards, Schooner, or both together. I actually don’t know what they finally decided on in that area. (I do know that for analytic DBMS they chose Vertica.)

I have a client who wants to build a new application with peak update volume of several million transactions per hour. (Their base business is data mart outsourcing, but now they’re building update-heavy technology as well. ) They have a small budget. They’ve been a MySQL shop in the past, but would prefer to contract (not eliminate) their use of MySQL rather than expand it.

My client actually signed a deal for EnterpriseDB’s Postgres Plus Advanced Server and GridSQL, but unwound the transaction quickly. (They say EnterpriseDB was very gracious about the reversal.) There seem to have been two main reasons for the flip-flop. First, it seems that EnterpriseDB’s version of Postgres isn’t up to PostgreSQL’s 8.4 feature set yet, although EnterpriseDB’s timetable for catching up might have tolerable. But GridSQL apparently is further behind yet, with no timetable for up-to-date PostgreSQL compatibility. That was the dealbreaker.

The current base-case plan is to use generic open source PostgreSQL, with scale-out achieved via hand sharding, Hibernate, or … ??? Experience and thoughts along those lines would be much appreciated.

Another option for OLTP performance and scale-out is of course memory-centric options such as VoltDB or the Groovy SQL Switch. But this client’s database is terabyte-scale, so hardware costs could be an issue, as of course could be product maturity.

By the way, a large fraction of these updates will be actual changes, as opposed to new records, in case that matters. I expect that the schema being updated will be very simple — i.e., clearly simpler than in a classic order entry scenario.

Categories: Cache, Clustering, Data mart outsourcing, EnterpriseDB and Postgres Plus, In-memory DBMS, Memory-centric data management, MySQL, OLTP, Open source, Parallelization, PostgreSQL, Software as a Service (SaaS), Vertica Systems

30 Comments

September 28, 2008

Exadata and Oracle Database Machine parallelization clarified

Some kind Oracle development managers have reached out and helped me better understand where Oracle does or doesn’t stand in query and analytic parallelization. This post supersedes prior discussions of the subject over the past week. Read more

Categories: Clustering, Data warehouse appliances, Data warehousing, Exadata, Oracle, Parallelization

10 Comments

June 28, 2008

Response to Rita Sallam of Oracle

In a comment thread on Seth Grimes’ blog, Rita Sallam of Oracle engaged in a passionate defense of her data warehousing software. I’d like to take it upon myself to respond to a few of here points here. Read more

Categories: Benchmarks and POCs, Clustering, Data warehousing, Oracle, Parallelization

10 Comments

← Previous Page

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in