# Difference between revisions of "SC²S Colloquium - February 23, 2018"

(Created page with "{| class="wikitable" |- | '''Date:''' || February 23, 2018 |- | '''Room:''' || 02.07.023 |- | '''Time:''' || 3:00 pm, s.t. |- |} == Nathan Brei: Generating small sparse matri...") |
|||

Line 9: | Line 9: | ||

|} | |} | ||

== Nathan Brei: Generating small sparse matrix multiplication kernels for Knights Landing == | == Nathan Brei: Generating small sparse matrix multiplication kernels for Knights Landing == | ||

− | High-performance seismic wave simulators based on ADER-DG, such as SeisSol, have an inner compute kernel consisting of a chain of small sparse and dense matrix multiplications. Due to their excellent scaling characteristics, these methods benefit from small improvements to single-core performance. In the past, two code generators have been employed to produce routines optimized for each matrix product. The sparse generator unrolls the sparsity pattern into the instruction stream, while the dense generator fills in the matrix and makes optimal use of vectorization and register blocking. This work combines ideas from each in order to design a family of generators, focusing on the dense-by-sparse case, which can outperform their predecessors. One generator combines sparsity pattern unrolling with register blocking, while another bypasses a number-of-nonzeros limit inherent in earlier sparse kernels. Other generators are developed to take advantage of regularities | + | High-performance seismic wave simulators based on ADER-DG, such as SeisSol, have an inner compute kernel consisting of a chain of small sparse and dense matrix multiplications. Due to their excellent scaling characteristics, these methods benefit from small improvements to single-core performance. In the past, two code generators have been employed to produce routines optimized for each matrix product. The sparse generator unrolls the sparsity pattern into the instruction stream, while the dense generator fills in the matrix and makes optimal use of vectorization and register blocking. This work combines ideas from each in order to design a family of generators, focusing on the dense-by-sparse case, which can outperform their predecessors. One generator combines sparsity pattern unrolling with register blocking, while another bypasses a number-of-nonzeros limit inherent in earlier sparse kernels. Other generators are developed to take advantage of regularities in the sparsity pattern. We demonstrate a speedup of 1.83, over the current dense kernel, for SeisSol's `star' matrix product. Scaling studies show that the speedup of UnrolledSparse is linear relative to the number of nonzeros in the sparse matrix. To implement these, tools for manipulating syntax trees of assembly code and traversing abstract matrix blocks were developed. |

− | |||

[[Category:ShowComingUp]] | [[Category:ShowComingUp]] | ||

[[Category:news]] | [[Category:news]] |

## Latest revision as of 16:39, 19 February 2018

Date: |
February 23, 2018 |

Room: |
02.07.023 |

Time: |
3:00 pm, s.t. |

## Nathan Brei: Generating small sparse matrix multiplication kernels for Knights Landing

High-performance seismic wave simulators based on ADER-DG, such as SeisSol, have an inner compute kernel consisting of a chain of small sparse and dense matrix multiplications. Due to their excellent scaling characteristics, these methods benefit from small improvements to single-core performance. In the past, two code generators have been employed to produce routines optimized for each matrix product. The sparse generator unrolls the sparsity pattern into the instruction stream, while the dense generator fills in the matrix and makes optimal use of vectorization and register blocking. This work combines ideas from each in order to design a family of generators, focusing on the dense-by-sparse case, which can outperform their predecessors. One generator combines sparsity pattern unrolling with register blocking, while another bypasses a number-of-nonzeros limit inherent in earlier sparse kernels. Other generators are developed to take advantage of regularities in the sparsity pattern. We demonstrate a speedup of 1.83, over the current dense kernel, for SeisSol's `star' matrix product. Scaling studies show that the speedup of UnrolledSparse is linear relative to the number of nonzeros in the sparse matrix. To implement these, tools for manipulating syntax trees of assembly code and traversing abstract matrix blocks were developed.