Storage
Analysis of storage technologies, especially in the context of database management. Related subjects include:
Notes on the evolution of OLTP database management systems
The past few years have seen a spate of startups in the analytic DBMS business. Netezza, Vertica, Greenplum, Aster Data and others are all reasonably prosperous, alongside older specialty product vendors Teradata and Sybase (the Sybase IQ part). OLTP (OnLine Transaction Processing) and general purpose DBMS startups, however, have not yet done as well, with such success as there has been (MySQL, Intersystems Cache’, solidDB’s exit, etc.) generally accruing to products that originated in the 20th Century.
Nonetheless, OLTP/general-purpose data management startup activity has recently picked up, targeting what I see as some very real opportunities and needs. So as a jumping-off point for further writing, I thought it might be interesting to collect a few observations about the market in one place. These include:
- Big-brand OLTP/general-purpose DBMS have more “stickiness” than analytic DBMS.
- By number, most of an enterprise’s OLTP/general-purpose databases are low-volume and low-value.
- Most interesting new OLTP/general-purpose data management products are either MySQL-based or NoSQL.
- It’s not yet clear whether MySQL will prevail over MySQL forks, or vice-versa, or whether they will co-exist.
- The era of silicon-centric relational DBMS is coming.
- The emphasis on scale-out and reducing the cost of joins spans the NoSQL and SQL-based worlds.
- Users’ instance on “free” could be a major problem for OLTP DBMS innovation.
I shall explain. Read more
Open issues in database and analytic technology
The last part of my New England Database Summit talk was on open issues in database and analytic technology. This was closely intertwined with the previous section, and also relied on a lot that I’ve posted here. So I’ll just put up a few notes on that part, with lots of linkage to prior discussion of the same points. Read more
Interesting trends in database and analytic technology
My project for the day is blogging based on my “Database and analytic technology: State of the union” talk of a few days ago. (I called it that because of when it was given, because it mixed prescriptive and descriptive elements, and because I wanted to call attention to the fact that I cover the union of database and analytic technologies – the intersection of those two sectors is an area of particular focus, but is far from the whole of my coverage.)
One section covered recent/ongoing/near-future trends that I thought were particularly interesting, including: Read more
Flash, other solid-state memory, and disk
If there’s one subject on which the New England Database Summit changed or at least clarified my thinking,* it’s future storage technologies. Here’s what I now think:
- Solid-state memory will soon be the right storage technology for a large fraction of databases, OLTP and analytic alike. I’m not sure whether the initial cutoff in database size is best thought of as terabytes or 10s of terabytes, but it’s in that range. And it will increase over time, for the usual cheaper-parts reasons.
- That doesn’t necessarily mean flash. PCM (Phase-Change Memory) is coming down the pike, with perhaps 100X the durability of flash, in terms of the total number of writes it can tolerate. On the other hand, PCM has issues in the face of heat. More futuristically, IBM is also high on magnetic racetrack memory. IBM likes the term storage-class memory to cover all this — which I find regrettable, since the acronym SCM is way overloaded already. 🙂
- Putting a disk controller in front of solid-state memory is really wasteful. It wreaks havoc on I/O rates.
- Generic PCIe interfaces don’t suffice either, in many analytic use cases. Their I/O is better, but still not good enough. (Doing better yet is where Petascan – the stealth-mode company I keep teasing about – comes in.)
- Disk will long be useful for very large databases. Kryder’s Law, about disk capacity, has at least as high an annual improvement as Moore’s Law shows for chip capacity, the disk rotation speed bottleneck notwithstanding. Disk will long be much cheaper than silicon for data storage. And cheaper silicon in sensors will lead to ever more machine-generated data that fills up a lot of disks.
- Disk will long be useful for archiving. Disk is the new tape.
*When the first three people to the question microphone include both Mike Stonebraker and Dave DeWitt, your thinking tends to clarify in a hurry.
Related links
- A slide deck by Mohan of IBM similar to the one he presented at the NEDB Summit about storage-class memories.
- A much more detailed IBM presentation on storage-class memories.
- Oracle’s and Teradata’s beliefs about the importance of solid-state memory.
Other posts based on my January, 2010 New England Database Summit keynote address
- Data-based snooping — a huge threat to liberty that we’re all helping make worse
- Interesting trends in database and analytic technology
- Open issues in database and analytic technology
Categories: Data warehousing, Michael Stonebraker, Presentations, Solid-state memory, Storage, Theory and architecture | 3 Comments |
The disk rotation speed bottleneck
I’ve been referring to the disk (rotation) speed bottleneck for years, but I don’t really have a clean link for it. Let me fix that right now.
The first hard disks ever were introduced by IBM in 1956. They rotated 1,200 times per minute. Today’s state-of-the-art disk drives rotate 15,000 times per minute. That’s a 12.5-fold improvement since the first term of the Eisenhower Administration. (I understand that the reason for this slow improvement is aerodynamic — a disk that spins too fast literally flies off the spindle.)
Unfortunately, random seek time is bounded below, on average, by 1/2 of a disk’s rotation time. Hence disk seek times can never get below 2 milliseconds.
15,000 RPM = 250 rotations/second, which implies 4 milliseconds/rotation.
From that, much about modern analytic DBMS design follows.
Two cornerstones of Oracle’s database hardware strategy
After several months of careful optimization, Oracle managed to pick the most inconvenient* day possible for me to get an Exadata update from Juan Loaiza. But the call itself was long and fascinating, with the two main takeaways being:
- Oracle thinks flash memory is the most important hardware technology of the decade, one that could lead to Oracle being “bumped off” if they don’t get it right.
- Juan believes the “bulk” of Oracle’s business will move over to Exadata-like technology over the next 5-10 years. Numbers-wise, this seems to be based more on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database management tasks.
And by the way, Oracle doesn’t make its storage-tier software available to run on anything than Oracle-designed boxes. At the moment, that means Exadata Versions 1 and 2. Since Exadata is by far Oracle’s best DBMS offering (at least in theory), that means Oracle’s best database offering only runs on specific Oracle-sold hardware platforms. Read more
Research agenda for 2010
As you may have noticed, I’ve been posting less research/analysis in November and December than during some other periods. In no particular order, reasons have included: Read more
Ray Wang on SAP
Ray Wang made a terrific post based on SAP’s annual influencer love-in, an event which I no longer attend. Ray believes SAP has been in a “crisis”, and sums up his views as
The Bottom Line – SAP’s Turning The Corner
Credit must be given to SAP for charting a new course. A shift in the management philosophy and product direction will take years to realize, however, its not too late for change. SAP must remember its roots and become more German and less American. The renewed focus must put customer requests and priorities ahead of SAP’s bureaucracy. The emphasis must focus on the relationship. When that reemerges in how SAP works with customers, partners, influencers, and its own employees, SAP will be back in good graces. In the meantime, its time to get to work and deliver. Oracle’s Fusions Apps are coming soon and competitors such as IBM, Microsoft, Epicor, IFS, and SalesForce.com will not relent.
I recall the 1980s, when SAP’s main differentiator, at least in the English-speaking US, was a total commitment to customer success, and when it could be taken for granted that SAP would do business ethically. Things change, and not always for the better.
Anyhow, the reason I’m highlighting Ray’s post is that he makes reference to a number of interesting SAP-cetric technology trends or initiatives. Read more
Categories: Analytic technologies, Business intelligence, Memory-centric data management, MOLAP, SAP AG, Solid-state memory | 1 Comment |
A framework for thinking about data warehouse growth
There are only three ways that the amount of data stored in data warehouses can grow:
- The same kinds of data are stored as before, with more being added over time.
- The same kinds of data are stored as before, but in more detail.
- New kinds of data are stored.
Categories: Analytic technologies, Application areas, Data warehousing, Investment research and trading, Log analysis, Solid-state memory, Storage, Telecommunications, Text, Web analytics | 9 Comments |
Boston Big Data Summit keynote outline
Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I’m posting them below.