Open source
Discussion of relational database management systems that are offered through some version of open source licensing. Related subjects include:
More on NoSQL and HVSP (or OLRP)
Since posting last Wednesday morning that I’m looking into NoSQL and HVSP, I’ve had a lot of conversations, including with (among others):
- Dwight Merriman of 10gen (MongoDB)
- Damien Katz of Couchio (CouchDB)
- Matt Pfeil of Riptano (Cassandra)
- Todd Lipcon of Cloudera (HBase committer)
- Tony Falco of Basho (Riak)
- John Busch of Schooner
- Ori Herrnstadt of Akiban
How should somebody teach themselves database and programming skills?
From time to time, I get in a conversation with somebody who is:
- Unemployed, underemployed, or otherwise desirous of having more commercial skills.
- Not a programmer, but desirous of having some technical skills.
- Astute enough to realize s/he will never be a serious techie.
I generally have two models in mind when guiding such a person:
- Analytics/business intelligence/stats.
- Website building.
Those are both useful skill sets for people who aren’t full-time techies, the first perhaps best for those who are more quantitative and big-company-friendly, the second perhaps better for the creative and/or rebellious types.
So what SPECIFICALLY should one guide them to do? My initial thoughts include: Read more
Categories: Business intelligence, MicroStrategy, MySQL, Open source | 35 Comments |
Some interesting links
In no particular order: Read more
Categories: Business intelligence, EnterpriseDB and Postgres Plus, Fun stuff, Hadoop, Humor, In-memory DBMS, MapReduce, Memory-centric data management, Open source, Oracle, SAP AG | 2 Comments |
Yet more on the GPL, WordPress themes, and the implications for MySQL storage engines
The debate I wrote about a few days ago over whether or not the WordPress theme called Thesis needed to be GPLed has been resolved in practice – it will be. More precisely, the parts that WordPress developers and the Free Software Foundation said need to be GPLed will be GPLed, while the rest won’t be, those parts being, in essence, the more “artistic” elements.
A consensus seems to have emerged that Thesis had actually copied beyond-fair-use amounts of WordPress code, which if true was Game Over. Beyond that, however, both sides of the strongly-viral-GPL debate scored some points. Read more
Categories: MySQL, Open source | 6 Comments |
New insights into the GPL vs. MySQL storage engine debates
Around the time of Oracle’s acquisition of Sun and hence MySQL, there was a lot of discussion as to whether MySQL’s GPL license could inhibit MySQL storage engine vendors from selling their products without MySQL code (e.g., with MySQL-fork front-ends). I argued No. Most people, however, seemed to think “Yes, and even if the matter isn’t clear, the threat of nasty lawyers creates enough FUD to be a practical market problem for the storage engine vendors.” Based on those concerns, I eventually took the position that Oracle should be inhibited for antitrust reasons from invoking its real or alleged GPL rights to mess with the MySQL storage engine vendors. Oracle’s agreement with the EU alleviated that concern, except that there was an annoying time limit on the alleviation.
Now a related can of worms has been opened in a related technology area — WordPress and WordPress themes. Since many bloggers use WordPress, this has gotten a lot of attention, and some interesting new insights have emerged. Read more
Categories: MySQL, Open source, Oracle | 10 Comments |
Riptano, and Cassandra adoption
Tonight’s Cassandra technology post got plenty long enough on its own, so I’m separating out business and adoption issues here. For starters, known Cassandra users include:
- Facebook, which has said it has 150 or so Cassandra nodes (but see below)
- Twitter, which has said it has 45 or so Cassandra nodes
- Rackspace, which used to be Jonathan Ellis’ employer, and now is backing Cassandra company Riptano
- Digg, which along with Twitter and Rackspace was one of the three major users helping advance the Cassandra project
- OpenX, Simple Geo, Digital Reasoning, who Jonathan cited as production users in March
- Cloudkick, as noted and linked in my other post
- Two customers Riptano named at launch (but I’ve forgotten who they were*)
Fetlife, Meebo, and others seem to at least have a healthy interest in Cassandra, based on their level of involvement in a forthcoming Cassandra Summit. That said, the @Fetlife tweetstream features numerous yelps of pain, and I don’t mean the recreational kind. Read more
Categories: Cassandra, DataStax, Facebook, Market share and customer counts, NoSQL, Open source, Parallelization, Pricing, Specific users | 5 Comments |
Cassandra technical overview
Back in March, I talked with Jonathan Ellis of Rackspace, who runs the Apache Cassandra project. I started drafting a blog post then, but never put it up. Then Jonathan cofounded Riptano, a company to commercialize Cassandra, and so I talked with him again in May. Well, I’m finally finding time to clear my Cassandra/Riptano backlog. I’ll cover the more technical parts below, and the more business- or usage-oriented ones in a companion Cassandra/Riptano post.
Jonathan’s core claims for Cassandra include:
- Cassandra is shared-nothing.
- Cassandra has good approaches to replication and partitioning, right out of the box.
- In particular, Cassandra is good for use cases that distribute a database around the world and want to access it at “local” latencies. (Indeed, Jonathan asserts that non-local replication is a significant non-big-data Cassandra use case.)
- Cassandra’s scale-out is application-transparent, unlike sharded MySQL’s.
- Cassandra is fast at both appends and range queries, which would be hard to accomplish in a pure key-value store.
In general, Jonathan positions Cassandra as being best-suited to handle a small number of operations at high volume, throughput, and speed. The rest of what you do, as far as he’s concerned, may well belong in a more traditional SQL DBMS. Read more
Categories: Amazon and its cloud, Cassandra, DataStax, Facebook, Google, Log analysis, NoSQL, Open source, Parallelization | 4 Comments |
Ingres VectorWise technical highlights
After working through problems w/ travel, cell phones, and so on, Peter Boncz of VectorWise finally caught up with me for a regrettably brief call. Peter gave me the strong impression that what I’d written in the past about VectorWise had been and remained accurate, so I focused on filling in the gaps. Highlights included: Read more
Categories: Actian and Ingres, Analytic technologies, Benchmarks and POCs, Columnar database management, Data warehousing, Database compression, Open source, VectorWise | 2 Comments |
Notes on the evolution of OLTP database management systems
The past few years have seen a spate of startups in the analytic DBMS business. Netezza, Vertica, Greenplum, Aster Data and others are all reasonably prosperous, alongside older specialty product vendors Teradata and Sybase (the Sybase IQ part). OLTP (OnLine Transaction Processing) and general purpose DBMS startups, however, have not yet done as well, with such success as there has been (MySQL, Intersystems Cache’, solidDB’s exit, etc.) generally accruing to products that originated in the 20th Century.
Nonetheless, OLTP/general-purpose data management startup activity has recently picked up, targeting what I see as some very real opportunities and needs. So as a jumping-off point for further writing, I thought it might be interesting to collect a few observations about the market in one place. These include:
- Big-brand OLTP/general-purpose DBMS have more “stickiness” than analytic DBMS.
- By number, most of an enterprise’s OLTP/general-purpose databases are low-volume and low-value.
- Most interesting new OLTP/general-purpose data management products are either MySQL-based or NoSQL.
- It’s not yet clear whether MySQL will prevail over MySQL forks, or vice-versa, or whether they will co-exist.
- The era of silicon-centric relational DBMS is coming.
- The emphasis on scale-out and reducing the cost of joins spans the NoSQL and SQL-based worlds.
- Users’ instance on “free” could be a major problem for OLTP DBMS innovation.
I shall explain. Read more
Quick news, links, comments, etc.
Some notes based on what I’ve been reading recently: Read more