Citation
Mabni, Zulaile
(2019)
A cluster-based hybrid replica control protocol for high availability in data grid.
Doctoral thesis, Universiti Putra Malaysia.
Abstract
Data Grid provides a scalable infrastructure for managing and storing large amount of
data files in Grid computing system. In Data Grid, data replication is a widely used
technique for managing data, where exact copies of data or replicas are created and
stored at many distributed sites. This technique provides high data availability and
increases the performance of the distributed systems. In recent years, the number of
distributed nodes has become very large in Grid computing system. The growing number
of nodes has raised few issues in data replication. The first issue is, nodes in the Grid
systems are dynamic where they can join or leave the system at any time. Therefore,
a replica control protocol must consider the dynamic aspects of the Data Grid. Next
important issue is replica placement which determines the suitable nodes to place the
replicas. Previously, replica placement has not been an issue since the research only
focuses on small-scale systems. However, in a larger system such as Data Grid, the
existing replica control protocols require bigger number of replicas to construct read
and write quorums. As the number of replicas increases, the communication cost also
increases and thus, degrades the performance of the protocols. Another issue is replica
consistency that needs to be ensured when copying data in a large-scale system. In
order to maintain replica consistency, if there is concurrent update to several replicas
of the same file, then all other replicas must have the same updated contents. Thus, an
efficient mechanism is needed to improve performance of the system while ensuring
replica consistency in Data Grid. Therefore, in this thesis, we proposed a new replica
control protocol named Cluster-Based Hybrid (CBH) protocol for large-scale system
with the objectives to reduce the communication cost, increase data availability, and
maintain replica consistency. CBH employs a hybrid replication strategy by combining
the advantages of two common replica control protocols to improve the performance of
the existing protocols. A clustering algorithm has been proposed to group the large nodes
into clusters and organize these clusters into a tree structure. Another proposed algorithm
is replica placement algorithm which selects and places only one replica in each cluster. The performance of CBH protocol is evaluated theoretically and using simulations. A discrete
event simulator called GridSim and Java programming language is used to simulate
the proposed protocol. The performance metrics which are communication cost and data
availability of the protocol are evaluated and compared with two latest quorum-based protocols
which are Dynamic Hybrid (DH) and Duplication on Grid (DDG) protocol. CBH
shows that by grouping the nodes into clusters and having only one replica in each cluster,
has minimized the number of replicas involved in constructing read and write quorums.
This research has contributed a dynamic cluster-based hybrid replica control protocol
which proposed a clustering algorithm to determine the number of clusters, a mechanism
for dynamic participation of nodes in the network, and a replica placement algorithm
that produces low communication cost and high data availability as compared to DH and
DDG protocols. CBH has proven that replica consistency is maintained by satisfying the
Quorum Intersection Properties.
Download File
Additional Metadata
Actions (login required)
|
View Item |