HPC - Algorithms and Applications - Winter 17
- Term
- Winter 17/18
- Lecturer
- Prof. Dr. Michael Bader
- Time and Place
- Mon, 14-16 in room MI 02.07.023 (first lecture: Wednesday, October 18th, 12:15-13:45, MI 02.07.023)
- Wed, 12-14, in room MI 02.07.023 (first exercise: Monday, October 23rd, 14:15-15:45)
- not all slots will be used; please refer to TUMonline for detailed schedule
- Audience
- Elective topic in Informatics Bachelor/Master: students in mathematics or in any science or engineering discipline are welcome!
- Tutorials
- Alexander Pöppl
- Exam
- Thu, Mar 1, 2018; from 13:15 in Interim 2 (see details below)
- Semesterwochenstunden / ECTS Credits
- 3 SWS (2V + 1Ü) / 4 ECTS
- TUMonline
- TUMOnline
Contents
Announcements
- there will be an extra session on Monday, Feb 19 (14-16) for questions concerning exam preparation
- there will be no lecture on Monday, Oct 30; we will add a lecture on a Wednesday later in the term
- there will be no lectures on the first day of the semester, i.e., on Monday Oct 16
- hence, the first lecture will be held on Wednesday, Oct 18, 12-14 (in the tutorial slot)
Content
The lecture will have a focus on parallel algorithms and implementation techniques in the field of numerical simulation and high performance computing, such as:
- linear algebra problems on dense and sparse matrices
- simulation on structured and unstructured meshes
- particle-based simulations (with long-range and short-range interactions)
- spectral methods (parallel FFT and related algorithms)
- Monte Carlo and statistical methods
(a.k.a. the seven dwarfs of HPC).
The accompanying tutorials will include practical assignments, and will concentrate on the programming of GPU and accelerator platforms.
Lecture Material
Lecture slides will be published/updated here after the lessons. See also the lecture from winter term 2016/17.
- Oct 18: Intro
- Oct 18, 23; Nov 8,20: Fundamentals - Parallel Architectures, Models, and Languages
- read the related paper: Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures (technical report by Williams et al.; published in Communications of the ACM 52 (4), 2009, p.65-76)
- further info: Cannon's Algorithm as MPI examples - mpi_cannon.c (unsafe send/receive), mpi_cannon_sr.c (using MPI_Sendrecv), mpi_cannon_nbl.c (non-blocking communication)
- Oct 23, Nov 8, 20: Dwarf No. 1 - Dense Linear Algebra
- further info: The GotoBLAS/BLIS Approach to Optimizing Matrix-Matrix Multiplication - Step-by-Step (by R. van de GeijnI)
- read the related paper: article by Elmroth et al. in SIAM Review (access via TUM eAccess, if necessary)
- chapter In-core dense matrices of the ScaLAPACK User's Guide
- Nov 13: Dwarf No. 2 - Sparse Linear Algebra: Application example (page rank) and data structures
- further info: The PageRank Citation Ranking: Bringing Order to the Web (technical report by L. Page, L. Brin, R. Motwani, T. Winograd, 1999)
- further info: The Anatomy of a Large-Scale Hypertextual Web Search Engine by S. Brin and L. Page (preprint; paper appeared in Computer Networks and ISDN Systems 30 (1-7), 1998)
- further info: PageRank Beyond the Web - article in SIAM Review 57(3) by David Gleich
- Nov 27, Dec 6: Parallel Sparse Matrix-Vector Multiplication - Parallel SpMV, Cartesian Distribution
- lecture material accompanying the book by R. Bisseling (compare the slides psc4_3.pdf, psc4_4.pdf
- Dec 6, 11, 18: Dwarf No. 5 - Structured Grids
- articles by M. Frigo and V. Strumpen:
Cache oblivious stencil operations (preprint);
The memory behavior of cache oblivious stencil operations (preprint can be found via Google) - article by K. Datta et al. in SIAM Review (preprint)
- articles by M. Frigo and V. Strumpen:
- Dec 18: Dwarf No. 5 extended: Structured Grids and Space-filling Curves (contents that were only presented in these slides will presumably not be part of the exam)
- IPython Notebook worksheets: Hilbert_Plotter.ipynb, sfc_hilbert_plotter_adp.ipynb
- Maple worksheets: hilbert_adap.mw (also as PDF);
- Jan 8, 15: Dwarf No. 6 - Unstructured Grids and Partitioning
- additional material: article by Karypis and Kumar: A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
- additional material: article by Hendrickson and Leland: A Multilevel Algorithm for Partitioning Graphs (this article was awarded the "Test of Time Award" at the 2014 Supercomputing Conference, SC14)
- Jan 22, 29: Dwarf No. 4: N-body methods and Barnes-Hut and Fast Multipole algorithm
- article by Barnes & Hut in Nature (article can be accessed via TUM ebib-access)
- article by C.R. Anderson (Fast Multipole without multipoles (article can be accessed via TUM ebib-access)
- article on Fast Multipole methods by C.R. Anderson
- Maple worksheet: twobody.mw (also as PDF)
- Feb 5: Parallel Sparse Matrix-Vector Multiplication revisited - Parallel SpMV and Graph Partitioning
Tutorials
Roughly every second week a two hour tutorial will take place (details at page top; days and time will be announced in TUMonline and in the lectures). The assignments and their solutions will be gradually posted here.
Date | Slides | Worksheet | Source | Source (solution) |
Oct 23th | Organizational remarks | |||
Oct 25th | Introduction to CUDA | Worksheet 1 | T1.1 T1.2/HW | included in Exercise 2 |
Nov 6th | Dense Linear Algebra in CUDA | Worksheet 2 | T2.2 H2.1 | Solution 2 |
Nov 22nd | Sparse Linear Algebra in CUDA | Worksheet 3 | T3.2 H3.1 | included in Exercise 4 |
Dec 4th | Solving the heat equation with CUDA | Worksheet 4 | T1.1a T1.1b H4.1 | Solution 4 |
Dec 20th | Performance Modelling | Worksheet 5 | - | Solution 5 |
Jan 17th | The Shallow Water Equations and CUDA | Worksheet 6 | Exercise 6 | |
Jan 31st | Further topics on SWE and CUDA | Worksheet 7 | Exercise 7 | - |
Exam
- written exam (regular exam and repeat exam will be offered; see below)
- if only few students register for the exam or repeat exam, the respective exam may be executed as a 30min oral exam for each student
- Helping material (for written exam only): One sheet of A4 paper (two-sided) with hand-written notes on it.
- The exam will extend over all topics discussed in the lectures and tutorials:
- approx. 30% of the questions will deal with questions related to the tutorials; basic knowledge about GPU programming with CUDA is thus necessary
- Practice exam questions with example solutions
- Questions concerning exam topics: there will be a respective "all questions answered" session on Mon, Feb 19 (14-16)
End-Term Exam
- Written Exam on Thursday, Mar 1, 2018 - 13:30-15:00 in Interim 2 (black building in front of Informatics building)
- please be in the lecture room in time (by 13:15); the exam will start on 13:30, at the latest, and there will be announcements before the start!
Repeat Exam
- currently scheduled on Apr 3, 2018 (date finalized at registration for the exam)
Literature and Online Material
- R.H. Bisseling: Parallel Scientific Computing - A structured approach using BSP and MPI, Oxford University Press, 2004.
- Course notes on Rob Bisseling's lecture on Parallel Algorithms (based on the text book)
- V. Eijkhout: Introduction to High-Performance Scientific Computing (textbook, available as PDF on the website)
- T.G. Mattson, B.A. Sanders, B.L. Massingill: Patterns for Parallel Programming, Addison-Wesley, 2005
- G. Hager, G. Wellein: Introduction to High Performance Computing for Scientists and Engineers, Chapman & Hall/CRC Computational Science, 2010
(all available as ebooks from TUM library)
Books on CUDA
- D.B. Kirk, W.W. Hwu: Programming Massively Parallel Processors - A Hands-on Approach, Morgan-Kaufman, 2nd edition, 2013
- J. Sanders, E. Kandrot: CUDA by Example, Addison-Wesley, 2011
(both available as ebooks from TUM library)
Prerequisites
Helpful, but not strictly required is knowledge in:
- basics of numerical methods (e.g.: lecture IN0019 Numerical Programming or similar)
- basics of parallel programming (lecture Parallel Programming, HPC - Programming Paradigms and Scalability, or similar)
Most important is a certain interest in problems from scientific computing and numerical simulation!