Why analytic DBMS increasingly need to be storage-aware
In my quick reactions to the EMC/Greenplum announcement, I opined
I think that even software-only analytic DBMS vendors should design their systems in an increasingly storage-aware manner
promising to explain what I meant later on. So here goes.
There always have been good technical reasons to tailor hardware to analytic database software. Data moves through disk controller, network, RAM, CPU and more, each with its own data rate. Getting different kinds of parts into the right balance doesn’t completely eliminate bottlenecks – the Wonderful One-Hoss Shay is poetic fiction – but it certainly can help. As a result, every analytic DBMS vendor of any size offers at least one of:
- A Type 0 appliance
- A Type 1 appliance
- A “recommended hardware configuration”
And beyond performance, appliances and pre-specified hardware configurations offer at least the possibility of easing installation, administration, and support.
There also are marketing reasons to offer an appliance or something appliance-like.
- To various extents, Oracle, Teradata, Microsoft, IBM, Netezza, and EMC are all telling the world that your hardware should be optimized for your analytic DBMS.
- Smaller vendors such as Vertica and Aster Data also tend to cobble together some sort of appliance, in part so they don’t have to say they disagree.
- Thus, a “We don’t see any point in special hardware assembly at all” story would leave an analytic DBMS vendor pretty far out on a limb.
Finally, there are three overlapping technical trends that increase the need for storage-awareness in analytic DBMS. First and foremost is the rise of solid-state memory. For starters, I believe:
- Flash will be important for analytic DBMS soon.
- There are good technical reasons for this.
- Oracle’s marketing will make a big deal out of the flash aspects of Exadata, so other analytic DBMS vendors will need a response. And of course, if Netezza or Teradata preemptively make a big deal of their flash-based offerings, that just adds to the pressure for Flash adoption on everybody else.
- But it’s not just flash – flash, other solid-state memory, and disk will be combined in various ways.
But this move to flash will require analytic DBMS vendors to be increasingly storage-aware for at least three reasons:
- It just adds another level of complexity to their hardware-balancing challenges.
- Flash overturns some of the fundamental assumptions of modern analytic DBMS, in particular:
- Sequential reads are hugely better than random
- The worst bottleneck is at the point where data comes out of storage.
- The flash technology stack is still immature, and you have to pick your poison in how to deal with it. Vendors are making very different choices in this regard – and they do have to choose.
Another trend that could naturally lead analytic DBMS vendors to be more storage-aware is their incorporation of what could be viewed as hierarchical storage/ILM technologies. Different data is stored in different ways and/or on different kinds of storage hardware. (Vendors pursuing – you guessed it – different approaches to this include Teradata, Greenplum, Vertica, and Sybase.) The more automatic that process is, the more storage-aware the DBMS will need to be.
Finally, there are reasons to think that DBMS should be split between conventional servers and smart storage. This is, of course, the Exadata strategy. Netezza’s two-processor approach, while rather different, also somewhat validates the idea.
Comments
6 Responses to “Why analytic DBMS increasingly need to be storage-aware”
Leave a Reply
The idea that flash helps remove the storage/disk access bottleneck and that columnar and other analytic DB architectures are largely predicated on opening that bottleneck isn’t unlike the insight followed by VoltDB (and before it in the early ’90s by kdb). The idea is that if you remove the need to support slow, blocking operations then you can just do serial operations very fast, use all your resources on computation and dramatically simplifying your system by cutting out all the thread management, concurrency management etc. required by supporting concurrent blocking operations. Do you think speaks to more of a convergence toward simpler architectures that rely on in-memory and fast storage and can support TX as well as analytic workloads because disk access is removed as an issue?
Mark,
I think OLTP and OLAP will long call for different DBMS architectures. It still matters whether you’re bringing back big blocks or single records. It still matters whether you have a big data redistribution issue. It still matters at what speed you’re doing update
@Mark
“The idea is that if you remove the need to support slow, blocking operations then you can just do serial operations very fast, use all your resources on computation and dramatically simplifying your system by cutting out all the thread management, concurrency management etc. required by supporting concurrent blocking operations.”
This is “one query – one thread – one CPU core” approach and it has proved to be suboptimal (Hey, Infobright). You will definitely need thread management and concurrency control in you system if you care about performance and resource utilization. I am not talking about extreme OLTP systems like VoltDB though – mostly about OLAP and analytical DBMS.
Good points made here. Flash memory technology is indeed a game changer for the DW industry. As Curt has posted here in the past, there’s already an appliance available that leverages an all-Flash memory approach in the form of Solid State Drive arrays for data storage. This product opens up whole new applications for DW in very high performance analytical applications such as near real time cyber security and cap markets portfolio risk assessment.
The ultimate best use of Flash is intelligently combining its capabilities with the other storage memory technologies to take best advantage of each. To do this right, the DBMS has to be very much storage aware. It has to be able to characterize the speed of the various types of storage hardware and at the same time characterize the usage patterns of each small data block – essentially takes its “temperature”. Then the DBMS has to be able to intelligently migrate data to the storage type that best meets its usage “temperature” – and do this all automatically. See another blog post (Teradata Virtual Storage) on this site for an available approach.
On the last point, splitting the DBMS between conventional servers and smart storage makes a lot of sense but has to be done carefully to be sure you can efficiently accomplish the “hybrid” storage work discussed above. We decided long ago that virtualization of the database servers and smart storage intelligence in the same powerful node made best use of all the resources including the enormous power of the Intel multi-core processors.
I am impressed with most of the discussion here. However one point should be made. A ‘DBMS’ has no intelligence, in the traditional sense. Further, those not practicing our discipline are done a disservice when anthropomorphic tags are used. Since computer systems execute what they are instructed to do by humans, humans need to optimize the performance of DBMS’ in order to take advantage of FLASH technology. Designers and architects need to inform developers and users what the “characterictstic speed of the various types of storage hardware and at the same time characterize the usage patterns of each small data block” in order for the DBMS to reach full performance potential. “Then the [project team]… has to be able to intelligently [prescribe the method to] migrate data to the storage type that best meets…” the characterized speed of the storage hardware and the usage of each small data block. Obviously, this technical work needs to be undertaken to enhance the performance of Electronic, digitally based, warehouse artifacts.
Using the short-hand of anthropormophism is as wrong as the consequences of not properly executing due diligence in developing reference data bases in the first place.
To me the term AI is an oxymoron.
Curr,
You said in your post:
“Flash overturns some of the fundamental assumptions of modern analytic DBMS, in particular:
* Sequential reads are hugely better than random
”
In my experience this might not necessarily be the case. A couple of months back I go t my hands on 2 of the new OCZ P88 SSD drives (PCI cards really). They are the top of the line in terms of SSD technology and have an advertised 30,000 IO/S and 1.4GB/s sequential read rate (and they cost 4000$ a piece). After playing with them I noticed the following:
1. With 4K pages, even the sequential read is slow => 33MB/s. Indeed the “seek time” is negligible but that does not mean that things are perfect
2. With 4M pages the sequential read is 1.4GB/s. That is 40 times better than 4K. That means that mostly sequential access using 4K pages is not anywhere close to large page sequential access. In fact, I think most people do not realize just how large the pages have to be to have peak performance from modern storage.
3. If you do the math, at 33MB/s for 4K pages you only get 9,000 IO/s. To get the rest you must have multiple pending requests. That means that you must have an engine that can “parallelize” index algorithms if you want to get good performance for a single query.
What the above mean to me is that you should still do pure sequential access for analytical queries and focus on a design that supports large pages and can use multiple devices in parallel. Case in point, for TPC-H Q6 at 1TB scale on the 2 P88 disks with DataPath (see the SIGMOD paper this year) I get running times of 62s. If you look at published TCP-H results you will notice that systems with 500+ disks take 5-6s, which is 10-12 times better. The bulk of that comes from selecting 1 out of 7 years (7X improvement) so the indexing does not help that much. The 7X is easy to pick up in a scan based system (you only have to start the scan where tuples with the correct date appear; this is a one-pony-trick since it works for a single attribute). The tremendous advantage with linear scans is that you can do parallel query execution. For example on DataPath 32 instances of Q6 take 70s (for all of them) and 64 instances 92s. This suggests that whatever advantage you have from indexing (random I/O) is wiped out for concurrent queries. Datapath was using 4M pages and was reading data sequentially at 2GB/s (over both disks).
In the end, when it comes to analytical queries, even on SSDs you might be better off with large linear scans. The good news is that the SSD disks improved both sequential and random I/O. In the case of OCZ Z-drives, it allows a tremendous I/O throughput in a very small package (an over-sized PCI card). With 20,000$ you can get a system that can rival in terms of I/O capabilities much more expensive systems.
Alin