Storage
Analysis of storage technologies, especially in the context of database management. Related subjects include:
Teradata’s future product strategy
I think Teradata’s future product strategy is coming into focus. I’ll start by outlining some particular aspects, and then show how I think it all ties together.
Read more
Categories: Business intelligence, Data warehouse appliances, Data warehousing, Kickfire, MicroStrategy, Solid-state memory, Storage, Teradata | 5 Comments |
Teradata, Xkoto Gridscale (RIP), and active-active clustering
Having gotten a number of questions about Teradata’s acquisition of Xkoto, I leaned on Teradata for an update, and eventually connected with Scott Gnau. Takeaways included:
- Teradata is discontinuing Xkoto’s existing product Gridscale, which Scott characterized as being too OLTP-focused to be a good fit for Teradata. Teradata hopes and expects that existing Xkoto Gridscale customers won’t renew maintenance. (I’m not sure that they’ll even get the option to do so.)
- The point of Teradata’s technology + engineers acquisition of Xkoto is to enhance Teradata’s active-active or multi-active data warehousing capabilities, which it has had in some form for several years.
- In particular, Teradata wants to tie together different products in the Teradata product line. (Note: Those typically all run pretty much the same Teradata database management software, except insofar as they might be on different releases.)
- Scott rattled off all the plausible areas of enhancement, with multiple phrasings – performance, manageability, ease of use, tools, features, etc.
- Teradata plans to have one or two releases based on Xkoto technology in 2011.
Frankly, I’m disappointed at the struggles of clustering efforts such as Xkoto Gridscale or Continuent’s pre-Tungsten products, but if the DBMS vendors meet the same needs themselves, that’s OK too.
The logic behind active-active database implementations actually seems pretty compelling: Read more
Categories: Clustering, Continuent, Data warehousing, Solid-state memory, Teradata, Theory and architecture, Xkoto | 9 Comments |
Why analytic DBMS increasingly need to be storage-aware
In my quick reactions to the EMC/Greenplum announcement, I opined
I think that even software-only analytic DBMS vendors should design their systems in an increasingly storage-aware manner
promising to explain what I meant later on. So here goes. Read more
Categories: Data warehouse appliances, Data warehousing, Solid-state memory, Storage, Theory and architecture | 6 Comments |
EMC is buying Greenplum
EMC is buying Greenplum. Most of the press release is a general recapitulation of Greenplum’s marketing messages, the main exceptions being (emphasis mine):
The acquisition of Greenplum will be an all-cash transaction and is expected to be completed in the third quarter of 2010, subject to customary closing conditions and regulatory approvals. The acquisition is not expected to have a material impact to EMC GAAP and non-GAAP EPS for the full 2010 fiscal year. Upon close, Bill Cook will lead the new data computing product division and report to Pat Gelsinger. EMC will continue to offer Greenplum’s full product portfolio to customers and plans to deliver new EMC Proven reference architectures as well as an integrated hardware and software offering designed to improve performance and drive down implementation costs.
Greenplum is one of my biggest vendor clients, and EMC is just becoming one, but of course neither side gave me a heads-up before the deal happened, nor have I yet been briefed subsequently. With those disclaimers out of the way, some of my early thoughts include:
- I wish my clients would never buy each other, but it’s inevitable.
- I don’t think anybody evaluating Greenplum should be much influenced by this deal one way or the other. (Whether they will be is of course a different matter.)
- EMC tends to run its bigger software acquisitions in a fairly hands-off manner. There’s no particular FUD (Fear/Uncertainty/Doubt) reason why this deal should stop anybody from buying Greenplum software.
- I also don’t think adding a rich parent adds much of a reason to buy from Greenplum. But if you’re the type who’s nervous about smaller vendors — well, Greenplum now isn’t so small.
- Greenplum Chorus could, in principle, work with non-Greenplum DBMS. That possibility suddenly looks a lot more realistic.
- The list of analytic DBMS vendors with an appliance orientation is pretty impressive, including:
- Oracle, with Exadata
- Microsoft, partially
- Teradata
- Netezza
- Now EMC/Greenplum, at least partially
- Weaker players such as:
- The ailing Kickfire, which a client (not Kickfire itself) tells me is being shopped around
- The reeling HP Neoview
- XtremeData, but I’m still waiting to hear of XtremeData’s first real sale
- Greenplum is something of a specialist in large databases. EMC has to love that.
- Greenplum’s weakness is concurrency.
- Greenplum’s “polymorphic storage” is a good fit for a storage vendor with appliance-y ideas.
- And finally — I think that even software-only analytic DBMS vendors should design their systems in an increasingly storage-aware manner, and have been advising my vendor clients of same. I’ll blog that line of reasoning separately when I get a chance, and edit in a link here after I do.
Related links (edit)
- Here’s the promised post as to why analytic DBMS need to be ever more storage-aware.
- Dave Kellogg crunched the EMC/Greenplum numbers, coming up with an estimated valuation range of $3-400 million, the high end of which is rumored to be correct.
- Merv Adrian suggests the big EMC/Greenplum loser is ParAccel, a viewpoint which presumably presupposes that the EMC/ParAccel partnership was significant in the first place.
- I talked with Ben Werther and posted more about Greenplum and EMC.
Categories: Data warehouse appliances, EMC, Greenplum, Storage | 13 Comments |
Flash is coming, well …
I really, really wanted to title this post “Flash is coming in a flash.” That seems a little exaggerated — but only a little.
- Netezza now intends to come out with a flash-based appliance earlier than it originally expected.
- Indeed, Netezza has suspended — by which I mean “scrapped” — prior plans for a RAM-heavy disk-based appliance. It will use a RAM/flash combo instead.*
- Tim Vincent of IBM told me that customers seem ready to adopt solid-state memory. One interesting comment he made is that Flash isn’t really all that much more expensive than high-end storage area networks.
Uptake of solid-state memory (i.e. flash) for analytic database processing will probably stay pretty low in 2010, but in 2011 it should be a notable (b)leading-edge technology, and it should get mainstreamed pretty quickly after that. Read more
Categories: Data integration and middleware, Data warehousing, IBM and DB2, Memory-centric data management, Netezza, Solid-state memory, Theory and architecture | 4 Comments |
VoltDB finally launches
VoltDB is finally launching today. As is common for companies in sectors I write about, VoltDB — or just “Volt” — has discovered the virtues of embargoes that end 12:01 am. Let’s go straight to the technical highlights:
- VoltDB is based on the H-Store technology, which I wrote about in February, 2009. Most of what I said about H-Store then applies to VoltDB today.
- VoltDB is a no-apologies ACID relational DBMS, which runs entirely in RAM.
- VoltDB has rather limited SQL. (One example: VoltDB can’t do SUMs in SQL.) However, VoltDB guy Tim Callaghan (Mark Callaghan’s lesser-known but nonetheless smart brother) asserts that if you code up the missing functionality, it’s almost as fast as if it were present in the DBMS to begin with, because there’s no added I/O from the handoff between the DBMS and the procedural code. (The data’s in RAM one way or the other.)
- VoltDB’s Big Conceptual Performance Story is that it does away with most locks, latches, logs, etc., and also most context switching.
- In particular, you’re supposed to partition your data and architect your application so that most transactions execute on a single core. When you can do that, you get VoltDB’s performance benefits. To the extent you can’t, you’re in two-phase-commit performance land. (More precisely, you’re doing 2PC for multi-core writes, which is surely a major reason that multi-core reads are a lot faster in VoltDB than multi-core writes.)
- VoltDB has a little less than one DBMS thread per core. When the data partitioning works as it should, you execute a complete transaction in that single thread. Poof. No context switching.
- A transaction in VoltDB is a Java stored procedure. (The early idea of Ruby on Rails in lieu of the Java/SQL combo didn’t hold up performance-wise.)
- Solid-state memory is not a viable alternative to RAM for VoltDB. Too slow.
- Instead, VoltDB lets you snapshot data to disk at tunable intervals. “Continuous” is one of the options, wherein a new snapshot starts being made as soon as the last one completes.
- In addition, VoltDB will also spool a kind of transaction log to the target of your choice. (Obvious choice: An analytic DBMS such as Vertica, but there’s no such connectivity partnership actually in place at this time.)
The Clustrix story
After my recent post, the Clustrix guys raised their hands and briefed me. Takeaways included: Read more
Categories: Application areas, Clustrix, Emulation, transparency, portability, Games and virtual worlds, MySQL, NoSQL, OLTP, Parallelization, Solid-state memory | 8 Comments |
Revisiting disk vibration as a data warehouse performance problem
Last April, I wrote about the problems disk vibration can cause for data warehouse performance. Possible performance hits exceeded 10X, wild as that sounds.
Now Slashdot and ZDnet have weighed in, although for the most part they only are suggesting 50-100% performance hits. Read more
Categories: Data warehousing, Storage | 3 Comments |
Clustrix may be doing something interesting
Clustrix launched without briefing me or, at least so far as I can tell, anybody else who knows much about database technology. But Clustrix did post a somewhat crunchy, no-registration-required, white paper. Based on that, I get the impression:
- Clustrix is making OLTP DBMS.
- The core problem Clustrix tries to solve is scale-out, without necessarily giving up SQL. (I couldn’t immediately tell whether Clustrix supports NoSQL-style key-value interfaces enthusiastically, grudgingly, or not at all.)
- Unlike Akiban or VoltDB, Clustrix makes database appliances. The Clustrix software seems to assume a Clustrix appliance.
- A key feature of Clustrix’s database appliances is that they rely on solid-state memory. I’m guessing that Clustrix appliances don’t even have disks, or that if they do the disks store some software or something, not actual data. (As previously noted, I agree with Oracle in thinking that much of the progress in database technology this decade will come from proper design for solid-state memory.)
- Clustrix talks of things that sound like compiled queries and attempts to avoid locks. However, it doesn’t sound as extreme in these regards as VoltDB.
- Clustrix also talks of things that sound like consistent hashing.
- The brand name “Sierra” also shows up along with the brand name “Clustrix.”
Categories: Clustrix, Data warehouse appliances, DBMS product categories, NoSQL, Parallelization, Solid-state memory, Storage, Theory and architecture | 2 Comments |
Thoughts on IBM’s anti-Oracle announcements
IBM is putting out a couple of press releases today that are obviously directed competitively at Oracle/Sun, and more specifically at Oracle’s Exadata-centric strategy. I haven’t been briefed, so I just have those to go on.
On the whole, the releases look pretty lame. Highlights seem to include:
- Maybe a claim of enhanced data compression.
- Otherwise, no obvious new technology except product packaging and bundling.
- Aggressive plans to throw capital at the Sun channel to convert it to selling IBM gear. (A figure of $1/2 billion is mentioned, for financing.
Disappointingly, IBM shows a lot of confusion between:
- Text data
- Machine-generated data such as that from sensors
While both highly important, those are very different things. IBM has not in the past shown much impressive technology in either of those two areas, and based on these releases, I presume that trend is continuing.
Edits:
I see from press coverage that at least one new IBM model has some Fusion I/O solid-state memory boards in it. Makes sense.
A Twitter hashtag has a number of observations from the event. Not much substance I could detect except various kind of Oracle bashing.
Categories: Database compression, Exadata, IBM and DB2, Oracle, Solid-state memory | 14 Comments |