SC²S Colloquium - January 14, 2015

From Sccswiki
Jump to navigation Jump to search
Date: January 14, 2014
Room: 02.07.023
Time: 3:00 pm, s.t.
Invited by: Dipl.-Inf. Christoph Riesinger

Thomas Hörmann: GPU-optimised implementation of high-dimensional tensor applications

Tensor Trains are a sparse representation of high dimensional tensors. They are used in quantum many body physics in order to find their ground state. For the numerical computation, a lot of inner products are needed. The computation of the inner product of two tensor trains can be done on parallel machines like multicore CPUs, CPU clusters and GPUs. Implementations for GPUs are possible in OpenCL as well as CUDA. Also vector extension like SSE and AVX would speed up the computations on CPUs. We propose a fast and efficient implementation for tensor train contractions in CUDA. The algorithm reaches up to 88% of the theoretical peak performance on the latest Nvidia architecture. The proposed algorithm can also profit from the use of multi GPU systems.