ePubs
The open archive for STFC research publications
Home
About ePubs
Content Policies
News
Help
Privacy/Cookies
Contact ePubs
Full Record Details
Persistent URL
http://purl.org/net/epubs/work/12153135
Record Status
Checked
Record Id
12153135
Title
Benchmarking mixed-mode PETSc performance on high-performance architectures
Contributors
M Lange
,
G Gorman
,
M Weiland
,
L Mitchell
,
X Guo (STFC Daresbury Lab.)
,
J Southern
Abstract
The trend towards highly parallel multi-processing is ubiquitous in all modern computer architectures, ranging from handheld devices to large-scale HPC systems; yet many applications are struggling to fully utilise the multiple levels of parallelism exposed in modern high-performance platforms. In order to realise the full potential of recent hardware advances, a mixed-mode between shared-memory programming techniques and inter-node message passing can be adopted which provides high-levels of parallelism with minimal overheads. For scientific applications this entails that not only the simulation code itself, but the whole software stack needs to evolve. In this paper, we evaluate the mixed-mode performance of PETSc, a widely used scientific library for the scalable solution of partial differential equations. We describe the addition of OpenMP threaded functionality to the library, focussing on matrix multiplication in Compressed Sparse Row (CSR) and Block CSR format. We highlight key challenges in achieving good parallel performance of threaded applications on modern multi-core processors, before demonstrating that the overall performance of the mixed-mode implementation is superior to that of the pure-MPI version. Using a set of matrices extracted from Fluidity, a CFD application code which uses the library as its linear solver engine, we then benchmark the parallel performance of mixed-mode PETSc across multiple nodes on several modern HPC architectures. We evaluate the parallel scalability on uniform memory access (UMA) platforms, such as the Fujitsu FX10 and IBM BlueGene/Q, as well as non-uniform memory access (NUMA) systems including the Cray XE6. We furthermore analyse the threaded performance of the Intel Xeon Phi coprocessor using the devised benchmarks. A detailed comparison is performed which highlights the characteristics of each particular architecture, before discussing the suitability of the provided benchmark suite to draw comparisons between HPC hardware vendors.
Organisation
ASTeC
,
STFC
,
HC
Keywords
Funding Information
Related Research Object(s):
Licence Information:
Language
English (EN)
Type
Details
URI(s)
Local file(s)
Year
Presentation
Presented at Exascale Applications and Software Conference 2013 (EASC 2013), Edinburgh, Scotland, 9-11 Apr 2013.
https://arxiv.org/pdf/1307.4567.pdf
2013
Showing record 1 of 1
Recent Additions
Browse Organisations
Browse Journals/Series
Login to add & manage publications and access information for OA publishing
Username:
Password:
Useful Links
Chadwick & RAL Libraries
Jisc Open Policy Finder
Journal Checker Tool
Google Scholar