ePubs

The open archive for STFC research publications

Full Record Details

Persistent URL	http://purl.org/net/epubs/work/12153135
Record Status	Checked
Record Id	12153135
Title	Benchmarking mixed-mode PETSc performance on high-performance architectures
Contributors	M Lange, G Gorman, M Weiland, L Mitchell, X Guo (STFC Daresbury Lab.), J Southern
Abstract	The trend towards highly parallel multi-processing is ubiquitous in all modern computer architectures, ranging from handheld devices to large-scale HPC systems; yet many applications are struggling to fully utilise the multiple levels of parallelism exposed in modern high-performance platforms. In order to realise the full potential of recent hardware advances, a mixed-mode between shared-memory programming techniques and inter-node message passing can be adopted which provides high-levels of parallelism with minimal overheads. For scientific applications this entails that not only the simulation code itself, but the whole software stack needs to evolve. In this paper, we evaluate the mixed-mode performance of PETSc, a widely used scientific library for the scalable solution of partial differential equations. We describe the addition of OpenMP threaded functionality to the library, focussing on matrix multiplication in Compressed Sparse Row (CSR) and Block CSR format. We highlight key challenges in achieving good parallel performance of threaded applications on modern multi-core processors, before demonstrating that the overall performance of the mixed-mode implementation is superior to that of the pure-MPI version. Using a set of matrices extracted from Fluidity, a CFD application code which uses the library as its linear solver engine, we then benchmark the parallel performance of mixed-mode PETSc across multiple nodes on several modern HPC architectures. We evaluate the parallel scalability on uniform memory access (UMA) platforms, such as the Fujitsu FX10 and IBM BlueGene/Q, as well as non-uniform memory access (NUMA) systems including the Cray XE6. We furthermore analyse the threaded performance of the Intel Xeon Phi coprocessor using the devised benchmarks. A detailed comparison is performed which highlights the characteristics of each particular architecture, before discussing the suitability of the provided benchmark suite to draw comparisons between HPC hardware vendors.
Organisation	ASTeC , STFC , HC
Keywords
Funding Information
Related Research Object(s):
Licence Information:
Language	English (EN)

Type	Details	URI(s)	Local file(s)	Year
Presentation	Presented at Exascale Applications and Software Conference 2013 (EASC 2013), Edinburgh, Scotland, 9-11 Apr 2013.	https://arxiv.org/pdf/1307.4567.pdf		2013