Difference between revisions of "SCCS Colloquium - Nov 21, 2019"

From Sccswiki
Jump to navigation Jump to search
Line 19: Line 19:
'''Keywords:''' Clustering, [[Machine Learning]], [[Sparse Grids]], Density-based Clustering
'''Keywords:''' Clustering, [[Machine Learning]], [[Sparse Grids]], Density-based Clustering
== Bruno Miguel: A Distributed Actor Library for HPC Applications ==
''Master's thesis submission talk. Bruno is advised by [[Alexander Pöppl]].''
The diversity of high-performance computer systems and application requirements harden the generalizability of software development. For these systems, where performance is critical, abstractions costs are unbearable. But, on the other hand, handcrafting code for every single requirement becomes complex, time-consuming and costly. In this thesis, we present an actor programming model library for high-performance applications running in distributed and shared-memory systems. The library aims to decouple computation and communication logic within finite-state machines entities and to release the pitfalls of parallel designs. We explore well-known standards such as MPI and OpenMP to provide broad cross-compatibility and close to native performance. Its usability and performance are explored on complex structured and unstructured mathematical models, and its results compared with other well-known programming models for HPC, such as the BSP and PGAS. By proxying a tsunami-generated simulation model, we achieve notable scalability on both weak and strong scaling performance tests. Our library allows programmers to focus on algorithms and data-dependencies specifications rather than low-level parallelization constructs and system load-balancing.
'''Keywords:''' [[HPC]], NPC, MPI, OpenMP, BSP, SWE

Latest revision as of 16:35, 14 November 2019

Date: November 21, 2019
Room: 00.08.053
Time: 15:00 - 16:00

Vincent Bennet Bautista Anguiano: Integration and Visualization of Sparse-Grid based Clustering Methods in the SG++ DataMining Pipeline

Master's thesis introduction talk. Vincent is advised by Paul Sarbu and Kilian Röhner.

The SG++ Datamining Pipeline is a component of the SG++ Toolbox whose main purpose is to provide an interface to generate machine learning models based on spare grid methods. So far, the pipeline provides support to generate Density Estimation, Classification and Regression Models. Sparse Grids methods offer the advantage of being able to significantly mitigate the curse of dimensionality while processing a large amount of data, making them an attractive option to generate machine learning models.

Clustering is also a common machine learning task which is not yet supported by the pipeline. It is then that the objective of this Master Thesis is to implement a density based clustering model taking as a base the algorithm developed by Peherstorfen [Model Order Reduction of Parametrized Systems with Sparse Grids Techniques, TUM, (2013)], which make use of sparse grids numerical methods.

Additionally, different dimensionality reduction algorithms, whose purpose is to visualize high dimensional clustering models, will be explored and compared to the ones already implemented within the pipeline.

Keywords: Clustering, Machine Learning, Sparse Grids, Density-based Clustering