SC²S Colloquium - November 4, 2015

From Sccswiki
Jump to navigation Jump to search
Date: November 4, 2015
Room: 02.07.023
Time: 3:00 pm, s.t.

Pablo Gómez: A Hardware-aware ADER-DG Method for Hyperbolic Partial Differential Equations

This thesis covers the implementation of a hardware-aware arbitrary high order derivative discontinuous galerkin (ADER-DG) method for hyperbolic partial differential equations. This method has been of a particular interest recently, as it is able to obtain a high order solution in space and time. A specific focus of this thesis is on the optimization of such a method on a current architecture. While the implemention is only used to solve a three-dimensional advection problem, it is also applicable to other hyperbolic problems like, e.g., the elastic wave equation. A brief introduction to the mathematics of the approach is given, followed by a thorough analysis of the characteristics of the method. The implementation is optimized for running on Intel Xeon E5-2697 v3 CPUs. The matrix-matrix multiplications in the method are performed using efficient library implementations and a hybrid OpenMP/MPI implementation is presented. Overall it is shown that the ADER-DG method is suitable for performance optimization on supercomputers and can be used to attain a high order of accuracy. Results for single and multi node performance on such an architecture are presented. In particular, convergence order, FLOPS and scaling are investigated. On a single node 50% of the theoretical peak performance and on 64 nodes up to 38% theoretical peak performance are reached.

Steffen Seckler: The fault tolerant combination technique in an iterative refinement framework

With petascale computing just around the corner fault tolerance becomes more and more important. Hereby two types of errors can occur: Hard faults and soft faults. The former are faults, that are noticed by the user. Such faults can occur when a process of a parallel application terminates unexpectedly, for example because of a hardware failure. In contrast to hard faults, soft faults are not detected explicitly. They arise due to bit flips in memory, cache or registers and produce errors in floating point computations, which lead to single wrong floating point numbers. While soft faults have to be handled algorithmically, since they can not be detected, hard faults can either be handled using algorithmic approaches or by simple checkpoint schemes. The latter can however produce a significant overhead -- algorithmic approaches can tackle this behavior. In this talk an Iterative Refinement scheme and a Fault Tolerant Combination Technique are combined, allowing to handle both hard and soft faults.

Karthikeya Sampa Subbarao: Elastic Machine Learning with Multivariate Extrapolation

Large scale machine learning (ML) often requires flexible specification of ML algorithms for dynamic scaling, depending on the availability of resources. In this paper, we introduce a grid-based technique for regression, which allows dynamic resource utilization and provides the capability to compute the best possible result at a given point of time. We try to provide the foundation for developing an approach to be implemented in a parallel framework with dynamic resource allocation.