# SC²S Colloquium - March 23, 2018

Date: |
March 23, 2018 |

Room: |
00.13.008 |

Time: |
15:00 - 16:15 |

## Martin Molzer: Implementation of a Parallel Sparse Grid Combination Technique

In high dimensional problems, the sparse grid technique is often used in place of classical numerical schemes because it is unaffected by the so called curse of dimensionality. The combination technique is then used to further decompose the problem into multiple partial problems that can be solved on many computers independently and in parallel. With an increasing number of parts participating in the computation, the algorithms have to be carefully designed to be resistant against the failure of a few components. This talk introduces a way to integrate variable process group sizes into the sparse grid framework SG++, allowing for a more dynamic and error proof combination technique.

## Congyu Zou: Commonality optimization

In the car industry, companies want to use the smallest possible number of different kinds of components in a series of cars and retain high design freedom of shared components at the same time. This Paper will look at this bivariate optimization problem from two perspectives, one directly as bi-objective optimization and one as the bi-level optimization. For the former, uses the rook model to encode the set partition, and then uses the Niched Genetic Algorithm and Iterative Genetic Algorithm to solve this problem. For the latter, this Paper uses two approaches to solve the subproblem, one is the Genetic algorithm with Simulate Annealing technology another is turning it into a search for the deepest node in the tree with two tree structures that can represent the set partition family, and then uses a Monte Carlo tree search to solve this problem.

## Peter Wauligmann: Parallel construction of unstructured mesh data structures using OpenMP and MPI

PUML is a library for reading large unstructured tetrahedral meshes, that uses a minimalistic mesh format as input, allows for partitioning and then generates important mesh data, such as faces, edges and adjacencies. The library is already parallelized with MPI. This thesis presents the shared memory parallelization with OpenMP. A secondary topic is the analysis of collective MPI communication in the current implementation of PUML. The main challenge of the parallelization is to design memory access patterns as efficiently as possible because the program consists of almost exclusively memory demanding sections. The implementation requires concurrent hash maps and therefore two customized hash maps for mesh elements are developed in this thesis. The OpenMP-parallelized PUML is tested on CoolMUC3, a cluster of Intel Xeon Phi *Knights Landing* many-core processors. We achieve a speedup of 1.52 when comparing the sequential versions and an additional speedup of 26 when running the program on 128 threads. This is without considering the I/O and partitioning, which are not part of the optimization. The developed hash maps' insert operation scales convincingly. When deploying 128 threads per node (hyperthreading), they reach a speedup of up to 114 for faces and 69 for edges. Finally, the implementation is tested successfully for an extreme scale mesh that contains more than 220 million elements.