DidaWiki

Indice

Journal of Lessons, SPD year 2015-2016
- Journal
- Slides, Notes and References to papers

Journal of Lessons, SPD year 2015-2016

Journal

23/02/2016 Course introduction, MPI basic concepts Parallel programming frameworks and high-level approach to parallel programming over different platforms: MPI, TBB and OpenCL as main examples; course organization and prerequisites; reference books and studying material. MPI (Message Passing Interface) standard : brief history and aim of the standard, single program / multiple data execution model, compilation and linkage model; issues in supporting multiple programming languages and uses (application, utility library and programming language support) with a static compilation and linkage approach. Portability in parallel programming: functional and non-functional aspects, performance tuning and performance debugging. MPI as a parallel framework that supports a structured approach to parallel programming. Basic concepts of MPI: communicators (definition, purpose, difference between inter and intra-communicators, process ranks);
24/02/2016 MPI basic concepts : communicators (definition, purpose, difference between inter and intra-communicators, process ranks); point to point communication (concepts of envelope, local/global completion, blocking/non-blocking primitive, send modes); collective communications (definition, communication scope, global serialization, freedom of implementation in the standard); MPI datatypes (basic meaning and use, primitive / derived datatypes, relationship with sequential language types).
01/03/2016 MPI : MPI library initialization and basic MPI usage; point to point communication semantics (buffer behaviour, receive, wildcards, status objects, MPI_PROC_NULL), basic and derived MPI datatypes (purpose as explicitly defined meta-data provided to the MPI implementation, multiple language bindings, code-instantiated metadata, examples).
02/03/2016 MPI : MPI datatypes (semantics, typemap and type signature, matching rules for communication, role in MPI-performed packing and unpacking); core primitives for datatype creation (MPI_Type_* : contiguous, vector, hvector, indexed, hindexed, struct, commit, free) and examples.
09/03/2016 MPI : point to point communication modes (MPI_BSEND, MPI_SSEND; MPI_RSend usage); non-blocking communication (Wait and Test group of primitives, semantics, MPI_Request object handles to active requests); canceling and testing cancellation of non-blocking primitives (issues and pitfalls, interaction with MPI implementation, e.g. MPI_finalize); communicators and groups (communicator design aim and programming abstraction, local and global information, attributes and virtual topologies, groups as local objects, primitives for locally creating and managing groups); intracommunicators (basic primitives concerning size, rank, comparison); communicator creation as a collective operation, MPI_Comm_create basic and general case.
11/03/2016 MPI LAB Writing structured MPI programs, point to point communications (ping pong, token ring and variants) and how to write reusable code by exploiting communicators.
16/03/2016 MPI MPI_Comm_split; collective communications (definition and semantics, execution environment, basic features, agreement of key parameters among the processes, constraints on Datatypes and typemaps for collective op.s, overall serialization vs synchronization, potential deadlocks); taxonomy of MPI collectives (blocking/non-blocking, synchronization/communication/communication+computation, asymmetry of the communication pattern, variable size versions, all- versions); MPI_IN_PLACE and collective op.s.; basic blocking collective operations (barrier, broadcast, gather/gatherV, scatter/scatterV, allgather, alltoall/alltoallv).
18/03/2016 MPI LAB Examples with derived datatypes (indexed, vector and their combinations).
23/03/2016 MPI Composing Datatypes, derived datatypes memory layout: explicitly setting and getting extent; Compute and communication collectives, MPI_Reduce, semantics; MPI Operators (arithmetic, logic, bitwise, MINLOC and MAXLOC) and their interaction with Datatypes; defining MPI custom operators via MPI_Create_op.
06/04/2016 MPI LAB Design and implementation of a simple farm skeleton in MPI. Reusability and separation of concerns in MPI: exploiting communicators for skeleton and inside skeleton implementation; simple and muliple buffering; different communication primitives (Synch/Buffered and Blocking/non blocking) wrt farm load distributions strategies: Round Robin, Job request, implicit job request with double buffering.
08/04/2016 TBB (Thread Building Blocks) TBB C++ template library overview: purpose, abstraction mechanisms and implementation layers (templates, runtime, supported abstractions, use of C++ concepts); tasks vs threads and parallel patterns hierarchical composability; parallel_for, ranges and partitioners; task partitioning and scheduling, grain size and affinity; quick note on the use of lambda expression, containers and mutexes.
13/04/2016 TBB TBB basic C++ concepts and algorithms (i.e. parallel skeletons). Binary splittables, range concept and blocked ranges, proportional split; parallel_for_each, parallel for; passing arguments to parallel algorithms (lamba functions vs body classes), optional arguments; parallel for 1D simplified syntax; partitioners.
~~15/04/2016~~ Lesson postponed
~~20/04/2016~~ Lesson postponed
22/04/2016 TBB reduce (differences between “functional” and “imperative” forms); deterministic reduce; pipeline class and filter class (i.e. stages), strongly typed parallel_pipeline and make_filter template. TBB containers: similarity and differences with STL containers, mutithreaded/sequential performance tradeoffs wrt software lockout, space and time overheads, relaxed/restricted semantics and feature drops, thread view consistency; container_range, extending containers to ranges, concurrent map and set templates: concurrent_hash, unordered, unordered_multi map; concurrent and unordered set.
27/04/2016 TBB Containers: concurrent queue, bounded_queue, priority queue; concurrent vector; thread local storage; C+11 style atomics.
29/04/2016 MPI Lab (3 hours) Mandelbrot set computation in MPI: farm optimization and tuning.
04/05/2016 OpenCL introduction GPGPU and OpenCL. Development history of modern GPUs, graphic pipeline, HW/FW implementations, load unbalance related to the distribution of graphic primitives executed, more “general purpose” and programmable core design; generic constraints and optimizations of the GPU approach; modern GPU architecture, memory optimization and constraints, memory spaces. GPGPU, and transition to explicitly general purpose programming languages for GPU. OpenCL intro and examples: framework goal (portability, widespdread adoption), design concepts and programming abstractions (Devices/host interaction, kernels, queues).
06/05/2016 TBB Lab (3 hours) TBB Lab exercises Farm-like implementation of the Mandelbrot set computation with TBB. Simple farm implementation, granularity and load balance with a high variance load; farm with partial computation and recycling of task continuation, extension to divide and conquer.
09/05/2016 OpenCL design concepts and programming abstractions: Devices/host interaction, context, kernel, command queues; execution model; memory spaces in OpenCL; C/C++ subset for kernels, kernel compilation, program objects, memory objects and kernel arguments, execution, kernel instances and workgroups, workgroup synchronization; portability and chances for load balancing: mapping OpenCL code onto both the GPU and the CPU; examples of vector types and vector operations; basic example of OpenCL program construction.
~~11/05/2016~~ moved to 09/05/2016
13/05/2016 OpenCL (3 hours) Writing, benchmark, analysis and optimization of an OpenCL program performing matrix multiplication: different approaches to data and control-flow decomposition and mapping intermediate results to different memory spaces (private/local/global memory and host memory), comparison with an equivalent CPU program; creating a trivial kernel with initialization and host code, choosing the data decomposition strategy (1D vs 2D) and its parameters; reduce pattern in OpenCL: work-item synchronization for parallel reduction, choosing a different computation pattern to reduce synchronizations (barriers).
18/05/2016 OpenCL OpenCL event generation and handling (event barriers) for inter-queue, non local synchronization. OpenCL 2.0 and 2.1 features: shared virtual memory (GPU/CPU memory space overlaying); nested kernels and recursive parallelism without host/device interaction; generic address space as a tool to avoid source code duplication; C11 atomics in OpenCL 2.0; pipes.
20/05/2016 (3 hours) MPI / TBB Lab time MPI / TBB implementation of the K-means clustering algorithm.
20/05/2016 (1 hours) OpenCL OpenCL 2.1: moving towards the use of a proper subset of C++14 for kernels (e.g. templates, overloading, lambda f.), allowing single-source OpenCL and non-OpenCL programming (SYCL) and providing a more homogeneous and organized semantics. The SPIR-V interoperable, symbolic GPU machine code representation and its use in the LLVM based development toolset; semplification and restructuring of the OpenCL support and its evolution with the introduction of the Vulkan layer.
23/05/2016 Project discussion

Slides, Notes and References to papers

Date	Slides	Notes	References / Info
23/02	Course introduction
24/02-01/03	MPI Lesson 1
01-02/03	MPI Lesson 2
09/03, 16/03	MPI Lesson 3 MPI Lesson 4
11/03, 18/03, 29/04	Mpi lab slides
16/03	MPI Lesson 5
23/03	MPI Lesson 6
08/04	TBB Lesson 1
13/04, 22/04	TBB Lesson 2	The file includes preliminary slides of lesson 3
22/04
27/04
04/05	GPGPU introductory slides
04/05, 09/05, 13/05	Tim Mattson's Intro to OpenCL	Slides 1–15 in the introductory lesson (4/05) Slides 16–48 on the first lesson (9/05) (you can skip slides 35–38); remember that details have changed over time, up to OpenCL2.1 (and forthcoming OpenCL 2.2). Slides 49-102 for lesson two (13/5).
06/05	TBB Lab exercises
18/05	Intro to OpenCL2.0 OpenCL 2.1 presentation	Download OpenCL 2.1 slides from Khronos's web site	Also check OpenCl 2.1 Quick reference card and other material concerning OpenCL 2.1 from Khronos OpenCL registry
20/05	OpenCL evolution and Vulkan		Sequential reference code for K-means Introductory notes about Data Mining Dhillon and Modha TR on K-means