SC²S Colloquium - January 14, 2015
|Date:||January 14, 2014|
|Time:||3:00 pm, s.t.|
|Invited by:||Dipl.-Inf. Christoph Riesinger|
Thomas Hörmann: GPU-optimised implementation of high-dimensional tensor applications
Tensor Trains are a sparse representation of high dimensional tensors. They are used in quantum many body physics in order to find their ground state. For the numerical computation, a lot of inner products are needed. The computation of the inner product of two tensor trains can be done on parallel machines like multicore CPUs, CPU clusters and GPUs. Implementations for GPUs are possible in OpenCL as well as CUDA. Also vector extension like SSE and AVX would speed up the computations on CPUs. We propose a fast and efficient implementation for tensor train contractions in CUDA. The algorithm reaches up to 88% of the theoretical peak performance on the latest Nvidia architecture. The proposed algorithm can also profit from the use of multi GPU systems.