SC²S Colloquium - November 10, 2009

From Sccswiki
Jump to navigation Jump to search
Date: November 10
Room: 02.07.023
Time: 16:30 pm, s.t.

Dheevatsa Mudigere: Data access optimized applications on the GPU using NVIDIA CUDA (MA)

GPUs offer tremendous computing power with very low costs and hence they have become a very attractive option for HPC applications. This work is an attempt to address the problem of bandwidth limited performance of data intensive GPGPU applications. In this direction, methods and approaches have been identified and formulated for optimizing data rearrangement in general on GPU architectures. These are employed to develop a library of near optimal and generic GPU kernels for a set of data rearrangement operations. The target GPU architectures considered in this work are - NVIDIA Tesla c1060 and NVIDIA Tesla c870 and kernels have been developed using NVIDIA CUDA. Furthermore, as a case study of a simple CFD Navier-Stokes based flow solver has been developed for the GPU, incorporating the optimal data rearrangement principles. This has been tested for the case of a 2D lid driven cavity flow. The GPU implementation is comprehensively compared with optimal serial and parallel CPU implementations on an Intel Nehalem X5550 platform.