Automatic redistribution of data warehouse data
In a recent Oracle Exadata FAQ, Kevin Closson writes:
Q. […] don’t some of the DW vendors split the data up in a shared nothing method. Thus when the data has to be repartitioned it gets expensive. Whereas here you just add another cell and ASM goes to work in the background. (depending upon the ASM power level you set.)
A. All the DW Appliance vendors implement shared-nothing so, yes, the data is chopped up into physical partitions. If you add hardware to increase performance of queries against your current dataset the data will have to be reloaded into the new partitioning scheme. As has always been the case with ASM, adding new disks-and therefore Exadata Storage Server cells-will cause the existing data to be redistributed automatically over all (including the new) drives. This ASM data redistribution is an online function.
Hmm. That sounds much like the story I’ve heard from various other data warehousing DBMS vendors as well.
Rather than try to speak for them, however, I’ll just post this and see whether they choose to add anything to the comment thread.
Comments
7 Responses to “Automatic redistribution of data warehouse data”
Leave a Reply
I think the key here is the automatic redistribution. If as they claim the data distribution is an online function and is automatic that definitely is different from, say Teradata, where you have to take an outage to redistribute the data.
No expertise,but here’s what the docs say:
“Rebalancing a disk group moves data between disks to ensure that every file is evenly spread across all of the disks in a disk group. When all of the files are evenly dispersed, all of the disks are evenly filled to the same percentage; this ensures load balancing. Rebalancing does not relocate data based on I/O statistics nor is rebalancing started as a result of statistics. ASM rebalancing operations are controlled by the size of the disks in a disk group.
“ASM automatically initiates a rebalance after storage configuration changes, such as when you add, drop, or resize disks. The power setting parameter determines the speed with which rebalancing operations occur.
“You can manually start a rebalance to change the power setting of a running rebalance. A rebalance is automatically restarted if the instance on which the rebalancing is running stops; databases can remain operational during rebalancing operations. A rebalance has almost no effect on database performance because only one megabyte at a time is locked for relocation and only writes are blocked.”
http://download.oracle.com/docs/cd/B28359_01/server.111/b31107/asmcon.htm#CJHGGECE
Hi Curt,
I would like to mention our online scalability capabilities in this context. Aster nCluster provides online scaling, not only for storage but also for the whole system. When adding a new server to the cluster, the administrator only needs to input the MAC address of the first network interface and power on the bare-metal machine. The system automatically gets the software (including the operating system), formats the drives, configures the network, and balances the existing data and workload. All this is done in the background and the system continues to be available to users during this process. Similarly, servers can be taken out of nCluster and repurposed for other use with a single-click on the Aster Management Console without incurring any system downtime.
http://www.asterdata.com/product/management.html has an overview of our manageability features.
Thanks,
Ajeet
Hi Ajeet,
I was guessing Aster might be the first vendor to respond to this thread. 😉
Best,
CAM
Hi Curt,
We’ll be the second vendor to respond to this thread.
EXASolution also has an automatic redistribution feature. No reloading is necessary. You integrate new servers by specifying MAC-adress and booting the server. The system remains accessible at all times, and it redistributes the data in the background.
Since EXASolution is based on an SPMD architecture, the new servers will increase performance linearly.
Regards,
Stu Greenberg
Stu,
No shock there, either.
I’m not sure, however, that I buy the claim SPMD = “pure linear scalability with no exceptions”, absent further elucidation. 😉
Best,
CAM
I confirm that teradata has to take the system offline to be able to do the redistribution..it is a critical operation that could lead to a lots of troubles if something crashes in the middle of that….!