Parallel Computing Homepage

> Lecture 1: 17.09.2018 (Miguel Areias)

Course Introduction
Syllabus, bibliography and assessment rules.

> Lecture 2: 17.09.2018 (Miguel Areias)

Foundations of Parallel Computing
Motivation to go parallel, grand challenges, concurrency and parallelism, implicit vs explicit parallelism, parallel architectures - Flynn's taxonomy. Multiprocessors vs. Multicomputers. Shared memory architectures. Distributed memory architectures. Recent multicore architectures. Parallel programming.

> Lecture 3: 24.09.2018 (Miguel Areias)

Foundations of Parallel Computing
Foster methodology for designing parallel algorithms: decomposition, communication patterns, agglomerate, and mapping. Granularity. Parallel overheads, latency, bandwidth.
Main parallel programming models: shared memory, distributed memory and hybrid. An example of a program in MPI (numerical integration). Parallel programming paradigms: master-slave, single-program-mutiple-data, data pipelining, tree-search and divide-and-conquer.

> Lecture 4: 24.09.2018 (Miguel Areias)

Programming for Distributed Memory with MPI.
The MPI (Message-Passing Interface): history, main concepts amd goals. The SPMD model (Single Program Multiple Data). Basic structure of a MPI program. The universal communicator (MPI_COMM_WORLD). Basic data types. An example of a program using MPI.
Sending and receiving messages in standard, synchronous or buffered modes.

> Lecture 5: 01.10.2018 (Miguel Areias)

Programming for Distributed Memory with MPI.
Types of communication: blocking versus non-blocking. Defining new derived data tipes (type vector and struct). Example programs. Packing and unpacking data in communications. Examples. Collective communication functions: broadcast, reduce, scatter, gather, and allreduce. Examples.

> Lecture 6: 01.10.2018 (Miguel Areias)

Programming for Distributed Memory with MPI.
Laboratory: setting up OpenMPI and exercises.

> Lecture 7: 08.10.2018 (Miguel Areias)

Programming for Distributed Memory with MPI.
Defining MPI groups and communicators of processes. Examples. Measuring execution time.

> Lecture 8: 08.10.2018 (Miguel Areias)

Programming for Distributed Memory with MPI.
MPI functions specialized topologies, such as grids. An example on the parallelization of an algorithm to calculate "pi" using a Monte Carlo method. Standard I/O. Compilation and execution of MPI programs using OpenMPI.
Introduction to Practical Assignment-1.

> Lecture 9: 15.10.2018 (Miguel Areias)

Programming for shared memory with processes.
The rank sort algorithm. Process creation. Shared memory: shm and mmap. Implementation of rank-sort with shm and mmap.

> Lecture 10: 15.10.2018 (Miguel Areias)

Programming for Distributed Memory with MPI.
Laboratory: exercises in MPI.

> Lecture 11: 22.10.2018 (Miguel Areias)

Programming for shared memory with processes.
Synchronization mechanisms for shared memory. Test-and-set lock. Read-write spinlocks. Semaphores. Examples: multiple readers and writers with a bounded buffer, sleeping barber.

> Lecture 12: 22.10.2018 (Miguel Areias)

Programming for shared memory with processes.
Laboratory: exercises.

> Lecture 13: 29.10.2018 (Eduardo R. B. Marques)

Parallel programming using threads.
Threads and processes, multi-threaded processes, the pthreads library. Thread creation and joining ("Hello pthreads" example). Dot product example: work sharing, prevention of races using mutexes. Heat diffusion example: using barriers for synchronized execution.

> Lecture 14: 29.10.2018 (Eduardo R. B. Marques)

Parallel programming using threads - part 2.
Heat diffusion example (cont.). Condition variables. Example uses of condition variables: fixed-capacity blocking queue, barrier implementation.

> Lecture 15: 05.11.2018 (Eduardo R. B. Marques)

Parallel programming using OpenMP (part 1) Introduction to OpenMP. The 'omp parallel' directive, basic OpenMP functions, private and shared variables. Simple data parallelism: the 'omp for' directive and the 'reduction' clause. Mutual exclusion in OpenMP: 'omp atomic', 'omp critical', and the use of locks. Other synchronization constructs: 'omp master', 'omp single', 'omp barrier', 'nowait' and 'ordered' clauses.

> Lecture 16: 05.11.2018 (Eduardo R. B. Marques)

Pthreads and OpenMP - lab exercises.

> Lecture 17: 12.11.2018 (Eduardo R. B. Marques)

Parallel programming using OpenMP (cont.)
Functional parallelism using 'omp sections'. The 'schedule' clause for 'omp for' loops. Static vs dynamic schedules with irregular computation. Loop-carried dependencies: detection, classification, and removal.

> Lecture 18: 12.11.2018 (Eduardo R. B. Marques)

OpenMP - lab exercises.

> Lecture 19: 19.11.2018 (Eduardo R. B. Marques)

Parallel programming using OpenMP (cont.)
Loop-carried dependencies (cont.). OpenMP orphaned constructs. Data sharing in the context of function calls.

Introduction to project assignment II.

> Lecture 20: 19.11.2018 (Eduardo R. B. Marques)

Project II - familiarisation with provided code. OpenMP - lab exercises (cont.).

> Lecture 21: 26.11.2018 (Miguel Areias)

Lock-Free Hash Tries.
Introduction to the key concepts about progress in concurrent systems and to the lock-free progress. Understand why lock-freedom is important in highly concurrent environments. Linearization. Study the internals of a lock-free hash map. Performance analysis comparison between state-of-the-art concurrent hash map designs.

> Lecture 22: 27.11.2018 (Eduardo R. B. Marques)

Lab exercises (project 2 + OpenMP).

> Lecture 23: 03.12.2018 (Eduardo R. B. Marques)

Cache memories and multi-core/multi-processor architectures.
Moore's Law and the "memory wall". The memory hiearchy and cache memories. Basic cache characteristics. Spatial and temporal locality in programs. Locality and synchronization in shared-memory multi-core programming: loop scheduling, false sharing, inconsistent parallelisation. The cost of synchronization: barriers, critical sections, and locks. Loop tiling for maximising cache hit ratios.

> Lecture 24: 03.12.2018 (Eduardo R. B. Marques)

Lab exercises - memory locality in OpenMP programs.

> Lecture 25: 04.01.2018 (Eduardo R. B. Marques)

Performance Analysis metrics
Speedup and efficiency. Linear/sub-linear/super-linear speedup. Efficiency in relation to growth in processing units and/or problem size. Amdhal's law and Gustafson-Barsis' law. The Karp-Flatt metric. The isoefficiency relation and the scalability function.