The open archive for STFC research publications

Full Record Details

Persistent URL http://purl.org/net/epubs/work/62682
Record Status Checked
Record Id 62682
Title _TRSV: optimizing triangular solve in CUDA
Abstract The _trsv level 2 BLAS routine solves the linear system Lx = b for some lower triangular matrix L. It is an important kernel in the solution phase of a direct linear solver, and is often run repeatedly for iterative refinement. The current CUBLAS implementation of _trsv fails to beat the host MKL performance even on many large matrices. In this talk we describe how to improve performance by an order of magnitude through minimizing memory latency overheads and the use of global memory rather than kernel launches for synchronization. These techniques may be of use in other problems where high performance is required but only small amounts of data are used each iteration.
Organisation CSE-NAG , STFC , SCI-COMP
Keywords CUDA , GPU , Linear Algebra
Funding Information
Related Research Object(s):
Licence Information:
Language English (EN)
Type Details URI(s) Local file(s) Year
Presentation Presented at Oxford e-Research centre Many-Core Seminar Series, Oxford, UK, 23 May 2012. dtrsv.pdf 2012