September 19, 2011

Are there any remaining reasons to put new OLTP applications on disk?

Once again, I’m working with an OLTP SaaS vendor client on the architecture for their next-generation system. Parameters include:

100s of gigabytes of data at first, growing to >1 terabyte over time.
High peak loads.
Public cloud portability (but they have private data centers they can use today).
Simple database design — not a lot of tables, not a lot of columns, not a lot of joins, and everything can be distributed on the same customer_ID key.
Stream the data to a data warehouse, that will grow to a few terabytes. (Keeping only one year of OLTP data online actually makes sense in this application, but of course everything should go into the DW.)

So I’m leaning to saying:

They should go with a scalable, MySQL-based solution.
- Lots of third-party software works with MySQL, in case that’s helpful.
- Yes, any one vendor is small and not yet firmly established, but there are numerous vendors around with interesting MySQL scaling stories.
- In a vendor emergency, just going with Oracle’s MySQL stuff would probably work …
- … especially because there are these lovely things in the world called solid-state drives.
- There’s also good escapability if one wants to move away from MySQL, because everybody knows how to handle MySQL data.
The first product to look at is dbShards, because it meets all the topology needs:
- Local scale-out (transparent sharding).
- Local high availability.
- Remote disaster recovery (details of that are underway).
The first analytic DBMS to look at is Infobright.
- Yes, I know Infobright is focused more on machine-generated data these days, but this client’s analytic needs are so straightforward Infobright should pass with flying colors.
- The MySQL-to-MySQL aspect should make ETL dead simple.
- Again, there’s escapability.

Mainly, this is all fine. But I’m getting pushback on the solid-state aspect, for fear that it will compromise public cloud portability.

Am I missing something here? As far as I’m concerned, if you’re planning an OLTP system with a many-year lifespan today, of course you should assume solid-state storage. Maybe you scale out just as far as you would with disk, striping indexes or entire databases across the RAM of multiple servers. It that case, having solid-state backing reduces the risk of bottlenecks. Maybe you don’t scale out as far as you would with disk. In that case, solid-state backing saves you money.

As for public-cloud support for solid-state storage, that’s coming fast, right? (Actually, I have data points in support of that theory, but they’re a bit tenuous.) A large fraction of web businesses with private data centers seem to be using solid-state storage — from Facebook on down — or so the NoSQL/NewSQL/short-request DBMS guys tell me. Surely a number of public cloud vendors are close behind.

Categories: Analytic technologies, Cloud computing, Clustering, Data warehousing, dbShards and CodeFutures, Facebook, Infobright, MySQL, OLTP, Open source, Parallelization, Software as a Service (SaaS), Solid-state memory

Subscribe to our complete feed!

Comments

13 Responses to “Are there any remaining reasons to put new OLTP applications on disk?”

anon on September 20th, 2011 11:31 am

Curt,
The cost in ETL is usually not a function of whether the source/destination db are the same platform. Its usually a cost of the logical transformation needed from the source to destination schema’s. If your schemas are identical, ETL is free (or nearly so) independent of source and destination db types.

As for sharding things by ‘customer – id’ , do all their large tables have this column? (something like product-catalog for e.g. may not)

Finally, on disk vs. flash – would really love to understand more. If ssd’s were not an option how would you change the above architecture?
Fred Holahan on September 20th, 2011 12:54 pm

@Curt

You’ve described a good use case for VoltDB.

Thanks.
Fred
Curt Monash on September 20th, 2011 1:46 pm

Fred,

I don’t think VoltDB’s programming model would work well in this situation.
Curt Monash on September 20th, 2011 1:58 pm

Anon,

On the ETL side, you are in essence right. But what I didn’t say is that there’s some doubt as to exactly what workload should go against the OLTP database and what should go against the analytic one. Using the “same” DBMS for both simplifies changing one’s mind.

We’ll see. As of yesterday, I’m not wholly convinced a true analytic DBMS has to be in the mix at all.

As for the SSD part — right now the app runs on a single SMP box, running a name-brand DBMS. That, we’re assuming, is unsustainable, and certainly doesn’t transition well to the public cloud. So scale-out is the obvious way to go.

But MySQL scale-out vendors are small — small enough to create non-trivial vendor risk. It’s already easier for me to recommend them because there are a number, so if one disappoints you port to using another. But using vanilla MySQL with SSDs is another nice option to have.

Conversely, while the “scale-out” vendors have advantages even without splitting your database — at least dbShards and Schooner do — if you can get away from splitting and just use better storage, that might be more cost-effective. And, as always, cost matters.

As for the data distribution on a single customer_ID key — off the top of my head, I haven’t thought of ANY join that would be done other than on that key. If I’m missing something, I’m pretty confident it would involve tiny tables, easily replicated. Indeed, the SQL vs. NoSQL choice is not wholly obvious here — even if I am pretty sure that SQL is actually the way to go.
Cloud Database on September 20th, 2011 7:37 pm

Sounds like a good fit for ScaleDB. My sources say SSD will soon be very available at Public Clouds.
Zman on September 20th, 2011 8:01 pm

Drizzle (a fork of MySQL) would be a good option to look at, if your OLTP database doesn’t require Stored Procedures, Triggers or Views. It has advantages over MySQL in the area of concurrency.

Drizzle
Fred Holahan on September 20th, 2011 8:30 pm

@Curt

FYI, we have recently added JDBC support, so developers can now choose to use that or our Java stored procedure interface (or both in combination).

That said, my prior comment was more about the use case you described than the specific client opportunity. Apologies if that was unclear.

Fred
Curt Monash on September 20th, 2011 9:40 pm

Zman,

I haven’t heard much lately about Drizzle, other than grumbling as to its momentum (technical, adoption, whatever).

I really would like there to be healthy MySQL front-end clones, given how I sometimes encourage people to use specialized MySQL engines and the like. But I haven’t heard much favorable rumbling from Drizzle or the Monty Project for a while.
Curt Monash on September 20th, 2011 9:41 pm

Fred,

Is VoltDB’s performance the same (or very similar) using both interfaces?
Curt Monash on September 20th, 2011 10:54 pm

Fred,

More to the point, have you relaxed the essential limitations of the VoltDB programming model, such as perfect commutativity? If I understand correctly, that’s what the JSP-only programming model was supposed to enforce.
Fred Holahan on September 21st, 2011 10:17 am

@Curt

While there is some interesting academic research on commutativity and H-Store style systems, VoltDB has never made commutativity a requirement. Our transactions can perform an arbitrary mix of SQL and computation with few restrictions.

Fundamentally, we require transactions to have a single round trip between the client app and the database. This enables our serialized, no-waits execution model. It requires deterministic transactions and necessitates stored procedures. We leverage this determinism extensively to achieve the same state in parallel on all database nodes, as well as to allow for deterministic replay of transactions in the event of a global failure.

To your original question, VoltDB’s core programming model has not changed in 2.0. While you can do ad hoc database operations via JDBC, the way to achieve uber-high throughput and scale is by executing Java stored procedures via your chosen client API (JDBC or a number of other available language interfaces). So, to clarify my prior comment, we added a JDBC interface in 2.0 to lower the bar for developers who are familiar with that API, as well as to support third party monitoring, reporting and developer tools.

I hope this information is helpful.

Fred
Curt Monash on September 21st, 2011 2:52 pm

Thanks, Fred. That clears up a lot!
Highlights of a busy news week | DBMS 2 : DataBase Management System Services on September 26th, 2011 12:51 am

[…] best comment thread of the week was probably on my post about scale-out relational OLTP choices, in which people discussed the merits of various particular […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Are there any remaining reasons to put new OLTP applications on disk?

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin