Load-Balancing Models for Scheduling Divisible Load on Large Scale Data Grids
Abduh Kaid, Monir Abdullah (2009) Load-Balancing Models for Scheduling Divisible Load on Large Scale Data Grids. PhD thesis, Universiti Putra Malaysia.
In many data grid applications, data can be decomposed into multiple independent sub datasets and distributed for parallel execution. This property has been successfully employed using Divisible Load Theory (DLT) , which has been proven to be a powerful tool for modeling divisible load problems in large scale data grid. Load balancing in such environment plays a critical role in achieving high utilization of resources to schedule the applications efficiently through join consideration of communication and computation time. There are some scheduling models, which have been studied, such as Constraint DLT (CDLT), Task Data Present (TDP) and Genetic Algorithm (GA). However, there has been no optimal solution reached. At the same time, effective schedulers are not only required to minimize the maximum completion time (makespan) of the jobs, but also the execution time of the schedulers.This thesis proposes several load balancing models for scheduling divisible load on large scale data grids, when both processor and communication link speed are heterogeneous. The proposed models can be decomposed into three stages. The first stage is to develop new DLT based models for multiple sources scheduling. Closed form solutions for the load allocation are derived. The new models are called Adaptive DLT (ADLT) and A2DLT models. In the second stage, an Iterative DLT (IDLT) model is proposed. Recursive numerical equations are derived to find the optimal workload assigned to the grid node. The closed form solutions are derived for the optimal load allocation. Although the IDLT model is proposed for single source, it has been applied in the case of multiple sources. The third stage integrates the proposed DLT based models with GA algorithm to solve the time consuming problem. In addition, the integration of the proposed DLT model with Simulated Annealing (SA) algorithm has been also developed. The experimental results have proven that the proposed models yield better perform ance than previous models in terms of makespan and scheduler execution time. The ADLT and A2DLT models have reduced the makespan by 21% and 37% respectively compared to CDLT model. The IDLT model is capable of producing almost optimal solution for single source scheduling with low time complexity. In addition, the integration of the proposed DLT model with GA and SA algorithms has also significantly improved the performance. The SA is 64.70% better than GA in terms of makespan. Thus, the proposed models can balance the processing loads efficiently so that they can be integrated in the existing data grid schedulers to improve the performance.
Repository Staff Only: Edit item detail