# Difference between revisions of "SC²S Colloquium - November 18, 2015"

(One intermediate revision by the same user not shown) | |||

Line 9: | Line 9: | ||

|} | |} | ||

+ | == Denys Sobchyshak: Online Graph Clustering with Sparse Grids Density Estimation == | ||

+ | During the past century fast-paced developments in technology have substantially | ||

+ | increased information flow and the need of its understanding. Within | ||

+ | such data centric environment an ability to make quick strategic decisions based | ||

+ | on generalized understanding of immense amounts of rapidly changing data | ||

+ | became of great importance and a major deciding factor between success and | ||

+ | missed opportunity. Today such problems are tackled by using efficient data | ||

+ | mining techniques, which allow an extraction of useful information out of cluttered | ||

+ | and not seldom obfuscated data sets and place it into clusters or densities | ||

+ | of certain shape. | ||

+ | While the task itself is not novel, methods and approaches of solving it have | ||

+ | gone a long way and with recent developments have opened new opportunities. | ||

+ | We want to focus on performing graph stream clustering in mini-batches | ||

+ | of constantly flowing data, with a goal of capturing topological information | ||

+ | and adapting to its change, as more recent data takes over older one, known as | ||

+ | concept drift. | ||

+ | We propose to consider dimensionality reduction techniques as means to | ||

+ | transform graph topological information into its feature space, which can be | ||

+ | done by applying certain weighting scheme based on dissimilarities between | ||

+ | graph nodes and its corresponding Laplacian matrix. Furthermore, we utilize | ||

+ | density estimation to identify regions of high and low node coupling, which | ||

+ | subsequently serve to perform clustering. | ||

+ | While one of the most popular ways to tackle unsupervised learning problem | ||

+ | of density estimation is by applying kernel methods, we argue that, because | ||

+ | their evaluation depends on all of the data points at any given time, | ||

+ | sparse grids approach stands as a better option, since not only it depends on | ||

+ | grid points and thus can be used at any point of stream processing, it also poses | ||

+ | an efficient way of decreasing the grid size, requiring fewer points stored to | ||

+ | perform estimation. | ||

+ | We demonstrate how our algorithm works on real-world data and discuss | ||

+ | some of the unexpected findings, when performing clustering on a feature | ||

+ | space of high-dimensionality. | ||

− | + | == Marat Faizov: Evaluation of the Performance of Vectorized Force Calculations for Molecular Dynamics == | |

+ | The simulation of particle interactions in cells as a part of molecular dynamics is gaining a lot of popularity in recent years, because it allows scientists to compute the motion of molecules on the micro level and study physical macro-characteristics of certain substances. | ||

+ | The movement of gas molecules in molecular dynamics is described by a system of ordinary differential equations, which is solved analytically for only two molecules. Through increased computer power developed in the last decade, there is a temptation to solve the problem of simulation of particle interactions at close to real life scale. | ||

+ | In this work we consider the molecules in molecular simulation code ls1 mardyn. We built a new class for vectorization tuning, which allowed us to evaluate the code's performance while isolating some effects. In addition, we introduce a new vectorization option for ls1 mardyn, based on the OpenMP Application Program Interface 4.0. | ||

[[Category:ShowComingUp]] | [[Category:ShowComingUp]] | ||

[[Category:news]] | [[Category:news]] |

## Latest revision as of 08:15, 14 November 2015

Date: |
November 18, 2015 |

Room: |
02.07.023 |

Time: |
3:00 pm, s.t. |

## Denys Sobchyshak: Online Graph Clustering with Sparse Grids Density Estimation

During the past century fast-paced developments in technology have substantially increased information flow and the need of its understanding. Within such data centric environment an ability to make quick strategic decisions based on generalized understanding of immense amounts of rapidly changing data became of great importance and a major deciding factor between success and missed opportunity. Today such problems are tackled by using efficient data mining techniques, which allow an extraction of useful information out of cluttered and not seldom obfuscated data sets and place it into clusters or densities of certain shape. While the task itself is not novel, methods and approaches of solving it have gone a long way and with recent developments have opened new opportunities. We want to focus on performing graph stream clustering in mini-batches of constantly flowing data, with a goal of capturing topological information and adapting to its change, as more recent data takes over older one, known as concept drift. We propose to consider dimensionality reduction techniques as means to transform graph topological information into its feature space, which can be done by applying certain weighting scheme based on dissimilarities between graph nodes and its corresponding Laplacian matrix. Furthermore, we utilize density estimation to identify regions of high and low node coupling, which subsequently serve to perform clustering. While one of the most popular ways to tackle unsupervised learning problem of density estimation is by applying kernel methods, we argue that, because their evaluation depends on all of the data points at any given time, sparse grids approach stands as a better option, since not only it depends on grid points and thus can be used at any point of stream processing, it also poses an efficient way of decreasing the grid size, requiring fewer points stored to perform estimation. We demonstrate how our algorithm works on real-world data and discuss some of the unexpected findings, when performing clustering on a feature space of high-dimensionality.

## Marat Faizov: Evaluation of the Performance of Vectorized Force Calculations for Molecular Dynamics

The simulation of particle interactions in cells as a part of molecular dynamics is gaining a lot of popularity in recent years, because it allows scientists to compute the motion of molecules on the micro level and study physical macro-characteristics of certain substances. The movement of gas molecules in molecular dynamics is described by a system of ordinary differential equations, which is solved analytically for only two molecules. Through increased computer power developed in the last decade, there is a temptation to solve the problem of simulation of particle interactions at close to real life scale. In this work we consider the molecules in molecular simulation code ls1 mardyn. We built a new class for vectorization tuning, which allowed us to evaluate the code's performance while isolating some effects. In addition, we introduce a new vectorization option for ls1 mardyn, based on the OpenMP Application Program Interface 4.0.