Citation
Abstract
The Partitioned Global Address Space (PGAS) model has been widely used in multi-core clusters as an alternative to MPI. Among the widespread use is Unified Parallel C (UPC). Previous research has shown that UPC performance is comparable with MPI, however in certain cases UPC require hand-tuning techniques such as prefetching and privatized pointers-to-shared to improve the performance. In this paper we reviews, evaluate and analyze the performance pattern between UPC Naïve, UPC optimize and MPI on two different multi-core clusters architecture. We focus our study using matrix multiplication as the benchmark and perform our experimental on two distributed memory machine, Cray XE6 with Gemini interconnects and Sun Cluster with Infiniband interconnects. We provide analysis on each core execution time to understand the pattern of communication for both machines. We also demonstrate the gaps between naïve and optimized are depends on the compiler with its associate distributed memory machine. We also observed unnecessary optimization for certain programs related to HPC architecture and compiler.
Download File
Full text not available from this repository.
|
Additional Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Divisions: | Faculty of Computer Science and Information Technology |
DOI Number: | https://doi.org/10.1109/HPCC.and.EUC.2013.250 |
Publisher: | IEEE (IEEE Xplore) |
Keywords: | Performance evaluation; PGAS; UPC; MPI; Gemini |
Depositing User: | Nursyafinaz Mohd Noh |
Date Deposited: | 03 Nov 2015 04:00 |
Last Modified: | 03 Nov 2015 04:00 |
Altmetrics: | http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1109/HPCC.and.EUC.2013.250 |
URI: | http://psasir.upm.edu.my/id/eprint/41309 |
Statistic Details: | View Download Statistic |
Actions (login required)
View Item |