Term: Winter 15/16
Lecturer: Prof. Dr. Michael Bader
Time and Place: Mon, 14.00-15.30, and Wed, 12.15-13.45, in room MI 02.07.023 (first lecture on Oct 19);
lectures and tutorials will alternate on Wednesdays; as Michael Bader is on research semester, some slots on Mon/Wed will be skipped
Audience: Elective topic in Informatics Bachelor/Master: students in mathematics or in any science or engineering discipline are welcome!
Tutorials: Alexander Pöppl, M.Sc.
Exam: repeat exam on April 7th, 15:30-17:15 in MI 02.07.023
Semesterwochenstunden / ECTS Credits: 3 SWS (2V + 1Ü) / 4 ECTS
TUMonline: https://campus.tum.de/tumonline/lv.detail?clvnr=950200026

Announcements

added new article, PageRank Beyond the Web, which recently appeared in SIAM Review 57(3) (by David Gleich)

Remaining Exercises

Exercise on Dec 9th
No exercise on Dec 16th
No exercise on Dec 23th
Exercise on Jan 13th
No exercise on Jan 20th
Q&A session on Jan 27th

Content

The lecture will have a focus on parallel algorithms and implementation techniques in the field of numerical simulation and high performance computing, such as:

linear algebra problems on dense and sparse matrices
simulation on structured and unstructured meshes
particle-based simulations (with long-range and short-range interactions)
spectral methods (parallel FFT and related algorithms)
Monte Carlo and statistical methods

(a.k.a. the seven dwarfs of HPC).

The accompanying tutorials will include practical assignments, and will concentrate on the programming of GPU and accelerator platforms.

Lecture Material

Lecture slides will be published here after the lessons: See also the lecture from winter term 2014/15.

Oct 19: Intro
Oct 26, Nov 2 & 4: Fundamentals - Parallel Architectures, Models, and Languages
- read the related paper: Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures (technical report by Williams et al.; published in Communications of the ACM 52 (4), 2009, p.65-76)
- further info: Cannon's Algorithm as MPI examples - mpi_cannon.c (unsafe send/receive), mpi_cannon_sr.c (using MPI_Sendrecv), mpi_cannon_nbl.c (non-blocking communication)
Oct 26, Nov 2, 4, 9: Dwarf No. 1 - Dense Linear Algebra
- further info: The GotoBLAS/BLIS Approach to Optimizing Matrix-Matrix Multiplication - Step-by-Step (by R. van de GeijnI)
- read the related paper: article by Elmroth et al. in SIAM Review (access via TUM eAccess, if necessary)
- chapter In-core dense matrices of the ScaLAPACK User's Guide
Nov 9, 11: Dwarf No. 2 - Sparse Linear Algebra: Application example (page rank) and data structures
- further info: The PageRank Citation Ranking: Bringing Order to the Web (technical report by L. Page, L. Brin, R. Motwani, T. Winograd, 1999)
- further info: The Anatomy of a Large-Scale Hypertextual Web Search Engine by S. Brin and L. Page (preprint; paper appeared in Computer Networks and ISDN Systems 30 (1-7), 1998)
- further info: PageRank Beyond the Web - article in SIAM Review 57(3) by David Gleich
Nov 23, 25: Parallel Sparse Matrix-Vector Multiplication - Parallel SpMV, Cartesian Distribution
- lecture material accompanying the book by R. Bisseling (compare the slides psc4_3.pdf, psc4_4.pdf and psc4_6.pdf)
Nov 30, Dec 7, 14: Dwarf No. 5 - Structured Grids
- articles by M. Frigo and V. Strumpen:
  Cache oblivious stencil operations (preprint);
  The memory behavior of cache oblivious stencil operations (preprint can be found via Google)
- article by K. Datta et al. in SIAM Review (preprint)
Dec 14, 21: Dwarf No. 5 extended Structured Grids and Space-filling Curves (contents that were only presented in these slides will not be part of the exam)
- IPython Notebook worksheets: Hilbert_Plotter.ipynb, sfc_hilbert_plotter_adp.ipynb
- Maple worksheets: hilbert_adap.mw (also as PDF);
Jan 18, 25: Dwarf No. 6 - Unstructured Grids and Partitioning
- additional material: article by Karypis and Kumar: A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
- additional material: article by Hendrickson and Leland: A Multilevel Algorithm for Partitioning Graphs (this article was awarded the "Test of Time Award" at the 2014 Supercomputing Conference, SC14)

Tutorials

Roughly every second week a two hour tutorial will take place (details at page top; days and time will be announced in TUMonline and in the lectures). The assignments and their solutions will be gradually posted here.

Date	Slides	Worksheet	Source	Source (solution)
Oct 21th	Organizational remarks	-	-	-
Oct 28th	Introduction to CUDA	Worksheet 1	Exercise 1	included in Exercise 2
Nov 11th	Further details on Dense LA in CUDA	Worksheet 2	Exercise 2	Solution 2
Nov 18th	Sparse LA in CUDA	Worksheet 3	Template for Lecture Template for Homework	Memory accesses, Code included in Exercise 4
Dec 2nd	Solving the heat equation with CUDA	Worksheet 4	Homework template (revised)	Solution 4
Dec 9th	The Shallow Water Equations and CUDA	Worksheet 5	Exercise 5	included in Exercise 6
Jan 13th	Further topics on SWE and CUDA	Worksheet 6	Exercise 6	-

Exam

Repeat Exam

the same rules as for the written exam will apply (see below)
if less than 10 participants register for the exam, it will be executed as a series of oral exams (announcement with details will follow by email)
written exam on Thu, Apr 7, 2016, from 15.30 (room MI 02.07.023)
In case of a written exam, please try to be in front of the room by 15:15 as the working time will start at 15.30. Announcements will be made prior to 15:30.
Helping material: One sheet of A4 paper (two-sided) with hand-written notes on it.
the exam will extend over all topics discussed in the lectures and tutorials:
- approx. 30% of the questions will deal with questions related to the tutorials; basic knowledge about GPU programming with CUDA is thus required

Exam Preparation

in the last lecture and in the last tutorial, there will be the oportunity to ask questions on lecture topics and exercises, respectively
the following worksheet contains some example questions (with solutions) from previous exams:
- exam questions with example solutions
- note that this collection of exercises does not reflect the extent of assignements in the exam
- note that the contents of the leture may have slightly changed compared to previous years, such that exercises can have a slightly different focus (this year, we did not cover the topics of exercises 2a and 2b, e.g.)

Literature and Online Material

R.H. Bisseling: Parallel Scientific Computing - A structured approach using BSP and MPI, Oxford University Press, 2004.
- Course notes on Rob Bisseling's lecture on Parallel Algorithms (based on the text book)
V. Eijkhout: Introduction to High-Performance Scientific Computing (textbook, available as PDF on the website)
T.G. Mattson, B.A. Sanders, B.L. Massingill: Patterns for Parallel Programming, Addison-Wesley, 2005
G. Hager, G. Wellein: Introduction to High Performance Computing for Scientists and Engineers, Chapman & Hall/CRC Computational Science, 2010

(all available as ebooks from TUM library)

Books on CUDA

D.B. Kirk, W.W. Hwu: Programming Massively Parallel Processors - A Hands-on Approach, Morgan-Kaufman, 2nd edition, 2013
J. Sanders, E. Kandrot: CUDA by Example, Addison-Wesley, 2011

(both available as ebooks from TUM library)

Prerequisites

Helpful, but not strictly required is knowledge in:

basics of numerical methods (e.g.: lecture IN0019 Numerical Programming or similar)
basics of parallel programming (lecture Parallel Programming, HPC - Programming Paradigms and Scalability, or similar)

Most important is a certain interest in problems from scientific computing and numerical simulation!

HPC - Algorithms and Applications - Winter 15

Contents

Announcements

Remaining Exercises

Content

Lecture Material

Tutorials

Exam

Repeat Exam

Exam Preparation

Literature and Online Material

Books on CUDA

Prerequisites

Navigation menu

HPC - Algorithms and Applications - Winter 15

Announcements

Remaining Exercises

Content

Lecture Material

Tutorials

Exam

Repeat Exam

Exam Preparation

Literature and Online Material

Books on CUDA

Prerequisites

Navigation menu

Search