MySQL

Analysis of open source DBMS vendor MySQL (recently acquired by Sun Microsystems), its products, and other products in the MySQL ecosystem. Related subjects include:

September 19, 2011

Are there any remaining reasons to put new OLTP applications on disk?

Once again, I’m working with an OLTP SaaS vendor client on the architecture for their next-generation system. Parameters include:

100s of gigabytes of data at first, growing to >1 terabyte over time.
High peak loads.
Public cloud portability (but they have private data centers they can use today).
Simple database design — not a lot of tables, not a lot of columns, not a lot of joins, and everything can be distributed on the same customer_ID key.
Stream the data to a data warehouse, that will grow to a few terabytes. (Keeping only one year of OLTP data online actually makes sense in this application, but of course everything should go into the DW.)

So I’m leaning to saying: Read more

Categories: Analytic technologies, Cloud computing, Clustering, Data warehousing, dbShards and CodeFutures, Facebook, Infobright, MySQL, OLTP, Open source, Parallelization, Software as a Service (SaaS), Solid-state memory

13 Comments

August 13, 2011

Couchbase technical update

My Couchbase business update with Bob Wiederhold was very interesting, but it didn’t answer much about the actual Couchbase product. For that, I talked with Dustin Sallings. We jumped around a lot, and some important parts of the Couchbase product haven’t had their designs locked down yet anyway. But here’s at least a partial explanation of what’s up.

memcached is a way to cache data in RAM across a cluster of servers and have it all look logically like a single memory pool, extremely popular among large internet companies. The Membase product — which is what Couchbase has been selling this year — adds persistence to memcached, an obvious improvement on requiring application developers to write both to memcached and to non-transparently-sharded MySQL. The main technical points in adding persistence seem to have been:

A persistent backing store (duh), namely SQLite.
A change to the hashing algorithm, to avoid losing data when the cluster configuration is changed.

Couchbase is essentially Membase improved by integrating CouchDB into it, with the main changes being:

Changing the backing store to CouchDB (duh). This will be in the first Couchbase release.
Adding cross data center replication on CouchDB’s consistency model. This will not, I believe, be in the first Couchbase release.
Offering CouchDB’s programming and query interfaces as an option. So far as I can tell, this will be implemented straightforwardly in the first Couchbase release, with elegance planned for later down the road.

Let’s drill down a bit into Membase/Couchbase clustering and consistency. Read more

Categories: Cache, Clustering, Couchbase, memcached, Memory-centric data management, MySQL, Parallelization, Solid-state memory

8 Comments

August 13, 2011

Couchbase business update

I decided I needed some Couchbase drilldown, on business and technology alike, so I had solid chats with both CEO Bob Wiederhold and Chief Architect Dustin Sallings. Pretty much everything I wrote at the time Membase and CouchOne merged to form Couchbase (the company) still holds up. But I have more detail now. 😉

Context for any comments on customer traction includes:

Membase went into limited production release in October, and full release in January. Similar things are true of CouchDB.
Hence, most sales of Couchbase’s products have been made over the past 6 months.
Couchbase (the merged product) is at this point only in a pre-production developer’s release.
Couchbase has both a direct sales force and a classic open-source “funnel”-based online selling model. Naturally, Couchbase’s understanding of what its customers are doing is more solid with respect to the direct sales base.
Most of Couchbase’s revenue to date seems to have come from a limited number of big-ticket “lighthouse” accounts (as opposed to, say, the larger number of smaller deals that come in through the online funnel).

That said,

Most Membase purchases are for new applications, as opposed to memcached migrations. However, customers are the kinds of companies that probably also are using memcached elsewhere.
Most other Membase purchases are replacements for the Membase/MySQL combination. Bob says those are easy sales with short sales cycles.
Pure memcached support is a small but non-zero business for Couchbase, and a fine source of upsell opportunities.
In the pipeline but not so much yet in the customer base are SaaS vendors and the like who use and may want to replace traditional DBMS such as Oracle. Other than among those, Couchbase doesn’t compete much yet with Oracle et al.
Pure CouchDB isn’t all that much of a business, at least relative to community size, as CouchDB is a single-server product commonly used by people who are content not to pay for support.

Membase sales are concentrated in five kinds of internet-centric companies, which in declining order are: Read more

Categories: Basho and Riak, Cassandra, Couchbase, CouchDB, Games and virtual worlds, HBase, Market share and customer counts, memcached, Mid-range, MySQL, NoSQL, Open source, Software as a Service (SaaS), Structured documents, Web analytics

2 Comments

July 26, 2011

Remote machine-generated data

I refer often to machine-generated data, which is commonly generated inexpensively and in log-like formats, and is often best aggregated in a big bit bucket before you try to do much analysis on it. The term has caught on, to the point that perhaps it’s time to distinguish more carefully among different kinds of machine-generated data. In particular, I think it may be useful to distinguish between:

Log-stream machine-generated data, when what you’re looking at — at least initially — is the entire output of verbose logging systems.
Remote machine-generated data.

Here’s what I’m thinking of for the second category. I rather frequently hear of cases in which data is generated by large numbers of remote machines, which occasionally send messages home. For example: Read more

Categories: Analytic technologies, Cloud computing, Log analysis, MySQL, Netezza, Splunk, Truviso

2 Comments

July 14, 2011

An odd claim attributed to Mike Stonebraker

This post has a sequel.

Last week, Mike Stonebraker insulted MySQL and Facebook’s use of it, by implication advocating VoltDB instead. Kerfuffle ensued. To the extent Mike was saying that non-transparently sharded MySQL isn’t an ideal way to do things, he’s surely right. That still leaves a lot of options for massive short-request databases, however, including transparently sharded RDBMS, scale-out in-memory DBMS (whether or not VoltDB*), and various NoSQL options. If nothing else, Couchbase would seem superior to memcached/non-transparent MySQL if you were starting a project today.

*The big problem with VoltDB, last I checked, was its reliance on Java stored procedures to get work done.

Pleasantries continued in The Register, which got an amazing-sounding quote from Mike. If The Reg is to be believed — something I wouldn’t necessarily take for granted — Mike claimed that he (i.e. VoltDB) knows how to solve the distributed join performance problem. Read more

Categories: Cache, Clustering, Couchbase, Games and virtual worlds, In-memory DBMS, memcached, Michael Stonebraker, MySQL, Parallelization, Theory and architecture, VoltDB and H-Store

20 Comments

July 5, 2011

Eight kinds of analytic database (Part 2)

In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I’ll cover four more kinds of analytic database — even newer, for the most part, with a use case/product short list match that is even less clear. Read more

Categories: Analytic technologies, Archiving and information preservation, Business intelligence, Buying processes, Cloud computing, Columnar database management, Data mart outsourcing, Data types, Data warehouse appliances, Data warehousing, Database compression, Database diversity, EAI, EII, ETL, ELT, ETLT, Greenplum, Hadoop, Investment research and trading, Log analysis, MapReduce, MOLAP, MySQL, Netezza, NoSQL, Open source, Petabyte-scale data management, Predictive modeling and advanced analytics, Rainstor, SAND Technology, Scientific research, SenSage, Software as a Service (SaaS), Streaming and complex event processing (CEP), Telecommunications, Vertica Systems, Web analytics

6 Comments

April 19, 2011

Notes on short-request scale-out MySQL

A press person recently asked about:

… start-ups that are building technologies to enable MySQL and other SQL databases to get over some of the problems they have in scaling past a certain size. … I’d like to get a sense as to whether or not the problems are as severe and wide spread as these companies are telling me? If so, why wouldn’t a customer just move to a new database?

While that sounds as if he was asking about scale-out relational DBMS in general, MySQL or otherwise, short-request or analytic, it turned out that he was asking just about short-request scale-out MySQL. My thoughts and comments on that narrower subject include(d) but are not limited to: Read more

Categories: Akiban, dbShards and CodeFutures, Investment research and trading, Kaminario, MySQL, NewSQL, Oracle, ScaleBase, ScaleDB, Schooner Information Technology, Solid-state memory, Tokutek and TokuDB, Web analytics

5 Comments

March 30, 2011

Short-request and analytic processing

A few years ago, I suggested that database workloads could be divided into two kinds — transactional and analytic. The advent of non-transactional NoSQL has suggested that we need a replacement term for “transactional” or “OLTP”, but finding one has been a bit difficult. Numerous tries, including high-volume simple processing, online request processing, internet request processing, network request processing, short request processing, and rapid request processing have turned out to be imperfect, as per discussion at each of those links. But then, no category name is ever perfect anyway. I’ve finally settled on short request processing, largely because I think it does a good job of preserving the analytic-vs-bang-bang-not-analytic workload distinction.

The easy part of the distinction goes roughly like this:

Anything transactional or “OLTP” is short-request.
Anything “OLAP” is analytic.
Updates of small amounts of data are probably short-request, be they transactional or not.
Retrievals of one or a few records in the ordinary course of update-intensive processing are probably short-request.
Queries that return or aggregate large amounts of data — even in intermediate result sets — are probably analytic.
Queries that would take a long time to run on badly-chosen or -configured DBMS are probably analytic (even if they run nice and fast on your actual system).
Analytic processes that go beyond querying or simple arithmetic are — you guessed it! — analytic.
Anything expressed in MDX is probably analytic.
Driving a dashboard is usually analytic.

Where the terminology gets more difficult is in a few areas of what one might call real-time or near-real-time analytics. My first takes are: Read more

Categories: Analytic technologies, Data warehousing, MySQL, NoSQL, OLTP

34 Comments

March 24, 2011

MySQL, hash joins and Infobright

Over a 24 hour or so period, Daniel Abadi, Dmitriy Ryaboy and Randolph Pullen all remarked on MySQL’s lack of hash joins. (It relies on nested loops instead, which were state-of-the-art technology around the time of the Boris Yeltsin administration.) This led me to wonder — why is this not a problem for Infobright?

Per Infobright chief scientist Dominik Slezak, the answer is

Infobright perform joins using its own optimization/execution layers (that actually include hash join algorithms and advanced knowledge-grid-based nested loop optimizations in particular).

Categories: Infobright, MySQL, Theory and architecture

4 Comments

March 23, 2011

Hadapt (commercialized HadoopDB)

The HadoopDB company Hadapt is finally launching, based on the HadoopDB project, albeit with code rewritten from scratch. As you may recall, the core idea of HadoopDB is to put a DBMS on every node, and use MapReduce to talk to the whole database. The idea is to get the same SQL/MapReduce integration as you get if you use Hive, but with much better performance* and perhaps somewhat better SQL functionality.** Advantages vs. a DBMS-based analytic platform that includes MapReduce — e.g. Aster Data — are less clear. Read more

Categories: Analytic technologies, Data warehousing, Hadapt, Hadoop, MapReduce, MySQL, Open source, Parallelization, PostgreSQL, SQL/Hadoop integration, Theory and architecture, VectorWise

12 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in