SC²S Colloquium - July 15, 2015

From Sccswiki
Jump to navigation Jump to search
Date: July 15, 2015
Room: 02.07.023
Time: 3:00 pm, s.t.

Alexander Shukaev: A Fully Parallel Process-to-Process Intercommunication Technique for preCICE

Large-scale numerical simulations of multiphysics scenarios have become a common daily task in the field of scientific computing. During the past decade, partitioned coupling of existing (application-specific) solvers through an intermediate piece of software, which treats those solvers as black boxes, has emerged as a well-established trend.

Precise Code Interaction Coupling Environment (preCICE) is a modern code coupling library for partitioned simulations of multiphysics scenarios. It is already successfully used to couple and drive (partitioned) simulations of fluid–fluid interaction (FFI), fluid–structure interaction (FSI), and some other scenarios. Previously, preCICE only offered a centralized intercommunication model (CICM) between the processes of the coupled (intraparallel) solvers, that is they had to forward all (data) intercommunication either through an intermediate server process or through their respective master processes. Performance testing has revealed that this approach indeed results in a severe data throughput bottleneck, which leads to a dramatic loss of scalability for medium- and large-sized production runs.

The major goal pursued by this work was to eliminate the intercommunication bottleneck completely and yield nearly ideal strong and weak scaling of the preCICE intercommunication routines. It was successfully accomplished by introducing a distributed intercommunication model (DICM) into preCICE, which is based upon a fully parallel process-to-process (P2P) intercommunication technique between the coupled solvers. Besides, another prime objective was that the implementation of the P2P intercommunication technique must transparently support both the TCP/IP and the MPI network communication standards. The ideal strong scaling performance achievements have been verified and confirmed at least up to 32768 processes overall by profiling various scalability benchmarks and application scenarios on the SuperMUC massively parallel system.