SCCS Colloquium - Oct 2, 2019
|Date:||October 2, 2019|
|Time:||15:00 - 16:00|
Deniz Candas: Auto-Tuning via Machine Learning in AutoPas
AutoPas is a C++ library capable of running molecular dynamics simulations with different optimization schemes, which are configurations with differing data structures (i.e. array of structures or structure of arrays), traversal strategies, container types (e.g. linked cells, verlet lists) and optimization techniques (enabling Newton 3 optimization). The current methodology uses an algorithm that tests out the whole search space, which increases day by day. In this thesis, the process of training a machine learning model based on neural networks to create an auto-tuner capable of suggesting the best simulation configuration available to AutoPas is shown. This strategy reduces search time by testing fewer options, and it chooses the optimal configuration with a likelihood of over 99%.
Keywords: Molecular Dynamics, Machine Learning, Neural Networks, Simulations, AutoPas
Martin Bogusz: Exploring Modern Runtime Systems for the SWE-Framework
Nowadays, processor development drives towards an increasing number of logical cores per processing unit. This leads to a growing need for concurrent execution to improve performance of applications. As synchronization and communication are complex tasks in a multi-core environment, parallelization frameworks are needed. In this thesis, we explored MPI, UPC++,Charm++, OpenMP and HPX by utilizing their concepts on a Tsunami approximation model - the SWE-Framework. Implementations were benchmarked on the Cool-MUC2 massively parallel processor with Intel ”Haswell” nodes. We measured performance, computation and communication time for strong and weak scaling scenarios on up to 896 processing elements. Overall, MPI performed best in terms of performance and scaling. UPC++ demonstrated stable communication time with increasing number of ranks, but showed significantly higher reduction and synchronization costs. Overdecomposition of Charm++ Chares did not lead to performance improvement on load-imbalanced scenarios, as communication overhead exceeded migration benefit. HPX showed best performance when utilizing two concurrent tasks per processing core, but overall performed slower than all other frameworks. Concluding, the HPX implementation could be further improved by adapting to a better fitting parallel concept. Best performance results could be achieved by utilizing a hybrid UPC++/MPI solution.
Keywords: HPX, Charm++, MPI, OpenMP , SWE