dCSE Fluidity-ICOM : high performance computing driven software development for next-generation modelling of the Worlds oceans

Guo, Xiaohu; Ashworth, M; Gorman, G

ePubs

The open archive for STFC research publications

Full Record Details

Persistent URL	http://purl.org/net/epubs/work/53443
Record Status	Checked
Record Id	53443
Title	dCSE Fluidity-ICOM : high performance computing driven software development for next-generation modelling of the Worlds oceans
Contributors	Xiaohu Guo (STFC Daresbury Lab.), M Ashworth (STFC Daresbury Lab.), G Gorman (Imperial College London)
Abstract	During the course of this project dCSE Fluidity-ICOM has been transformed from a code that was primarily used on institution level clusters with typically 64 tasks used per simulation into a highly performing scalable code which can be run efficiently on 4096 cores of the current HECToR hardware (Cray XT4 Phase2a). Fluidity-ICOM has been parallelised with MPI and optimised for HECToR alongside continual in-depth performance analysis. The following list highlights the ma jor developments: - The matrix assembly code has been optimised, including blocking. Fluidity-ICOM now supports block-CSR for the assembly and solves of vector fields and DG fields. - Interleaved I/O has been implemented to the vtu output. The performance analysis has been done with gyre test case, so far no performance improvement has been observed. The parallel I/O strategy has not yet been applied to the mesh file output as the final file format has still not been decided yet. - An optimal renumbering method for parallel linear solver performance has been implemented (provided via the PETSc interface). In general, it is recommended to use Reverse Cuthil l-McKee to get best performance. - Fluidity-ICOM has relatively complex dependencies on third party software, several modules were made for HECToR users to easily set software environment and install Fluidity-ICOM on HECToR. - The differentially heated rotating annulus benchmark was used to evaluate the scalability of mesh adaptivity. A scalability analysis of both the parallel mesh optimisation algorithm and of the complete GFD model was performed. This allows the performance of the parallel mesh optimisation method to be evaluated in the context of a "real" application. Extensive profiling has been performed with several benchmark test cases using CrayPAT and VampirTrace: -Auto profiling is proved not very useful for large test cases but MPI statistics of auto prfiling are still very useful for large test cases, which also helped to identify the problems with surface labelling which cause large overhead for CrayPAT. There are still on going issues of PETSc instrumentation. - VampirTrace GNU version are proved to be useful for mesh adaptivity part tracing, several interesting results have been made. - Profiling the real world applications has proved to be a big challenge. It required a considerable understanding of profiling tools and extensive knowledge of the software itself. The introduction of manual instrumentation was required in order to focus on specific sections of the code. Determining a suitable way to reduce the profiling data size without losing the fine grain details was critical for successfully profiling. Inevitably this procedure involved much experimentation requiring large numbers of profiling runs.
Organisation	CSE , CSE-HEC , STFC
Keywords	Natural environment , Unstructured Mesh , Adaptivity , Profiling , Mesh Optimisation , Fluidity-ICOM , Parallel I/O
Funding Information
Related Research Object(s):
Licence Information:
Language	English (EN)

Type	Details	URI(s)	Local file(s)	Year
Report	2010.	http://www.hector…rts/fluidity-icom01/	fluidity-icom01(2).pdf	2010