DidaWiki

Indice

Journal of Lessons, SPD year 2016-2017
- Journal
- Slides, Notes and References to papers

Journal of Lessons, SPD year 2016-2017

Journal

20/02/2017 Course introduction Parallel programming frameworks and high-level approach to parallel programming over different platforms: MPI, TBB and OpenCL as main examples; course organization and prerequisites; reference books and studying material.
MPI (Message Passing Interface) standard : brief history and aim of the standard, single program / multiple data execution model, compilation and linkage model; issues in supporting multiple programming languages and uses (application, utility library and programming language support) with a static compilation and linkage approach. Portability in parallel programming: functional and non-functional aspects, performance tuning and performance debugging.
22/02/2017 MPI basic concepts : MPI as a parallel framework that supports a structured approach to parallel programming. Basic concepts of MPI: communicators (definition, purpose, difference between inter and intra-communicators, process ranks); point to point communication (concepts of envelope, local/global completion, blocking/non-blocking primitive, send modes); collective communications (definition, communication scope, global serialization, freedom of implementation in the standard); MPI datatypes (basic meaning and use, primitive / derived datatypes, relationship with sequential language types).
27/02/2017 MPI : MPI library initialization and basic MPI usage; point to point communication semantics (buffer behaviour, receive, wildcards, status objects, MPI_PROC_NULL), basic and derived MPI datatypes (purpose as explicitly defined meta-data provided to the MPI implementation, multiple language bindings, code-instantiated metadata, examples). MPI datatypes (semantics, typemap and type signature, matching rules for communication, role in MPI-performed packing and unpacking); core primitives for datatype creation ( MPI_Type_* : contiguous, vector, hvector, commit, free) and examples.
01/03/2017 MPI : more derived datatypes (indexed, hindexed, struct); point to point communication modes (MPI_BSEND, MPI_SSEND; MPI_RSend usage); non-blocking communication (Wait and Test group of primitives, semantics, MPI_Request object handles to active requests); canceling and testing cancellation of non-blocking primitives (issues and pitfalls, interaction with MPI implementation, e.g. MPI_finalize); communicators and groups (communicator design aim and programming abstraction, local and global information, attributes and virtual topologies, groups as local objects, primitives for locally creating and managing groups).
06/03/2017 MPI : intracommunicators (basic primitives concerning size, rank, comparison); communicator creation as a collective operation, MPI_Comm_create basic and general case; MPI_Comm_split; collective communications (definition and semantics, execution environment, basic features, agreement of key parameters among the processes, constraints on Datatypes and typemaps for collective op.s, overall serialization vs synchronization, potential deadlocks); taxonomy of MPI collectives (blocking/non-blocking, synchronization/communication/communication+computation, asymmetry of the communication pattern, variable size versions, all- versions).
08/03/2017 MPI Lab Basic program structure. Examples with derived datatypes.
13/03/2017 MPI Lab Implementing communication with assigned asynchronicity degree in MPI. Structured parallel programming in MPI, separation of concerns in practice. Structured parallel patterns in MPI and communicator handling.
~~15/03/2017~~ 16/03/2017 MPI Farm skeleton implementation with MPI. MPI collectives with both computation and communication: Reduce (and variants) and Scan (and variants). Using MPI operators with Reduce and Scan. Defining custom user operators, issues and implementation of operator functions.
20/03/2017 MPI Lab Asynchronous channel implementation, Farm skeleton implementation. Parallel code basic debugging.
~~22/03/2017~~ postponed
27/03/2017 TBB TBB C++ template library overview: purpose, abstraction mechanisms and implementation layers (templates, runtime, supported abstractions, use of C++ concepts); tasks vs threads and parallel patterns hierarchical composability; parallel_for, ranges and partitioners; task partitioning and scheduling, grain size and affinity; quick summary on the use of lambda expression and containers.
29/03/2017 TBB Quick note on mutexes. TBB basic C++ concepts and algorithms (i.e. parallel skeletons). Binary splittables, range concept and blocked ranges, proportional split; parallel_for_each, parallel for; passing arguments to parallel algorithms (lamba functions vs body classes), optional arguments; parallel for 1D simplified syntax; partitioners.
~~03/04/2017~~
05/04/2017 TBB reduce (differences between “functional” and “imperative” forms); deterministic reduce; pipeline class and filter class (i.e. stages), strongly typed parallel_pipeline and make_filter template. TBB containers: similarity and differences with STL containers, mutithreaded/sequential performance tradeoffs wrt software lockout, space and time overheads, relaxed/restricted semantics and feature drops, thread view consistency; container_range, extending containers to ranges, concurrent map and set templates: concurrent_hash, unordered, unordered_multi map; concurrent and unordered set.
26/04/2017 Intro to GPU-based computing GPGPU and OpenCL. Development history of modern GPUs, graphic pipeline, HW/FW implementations, load unbalance related to the distribution of graphic primitives executed, more “general purpose” and programmable core design; generic constraints and optimizations of the GPU approach; modern GPU architecture, memory optimization and constraints, memory spaces. GPGPU, and transition to explicitly general purpose programming languages for GPU. Management of large sets of thread processors, concept of command queue and concurrent execution of tasks; consequences on the constraint over synchronization of large computations split among several thread processors.
03/05/2017 TBB Lab time – – Basics of TBB, Mandelbrot Set algorithm implementation.
08/05/2017 TBB Lab time – TBB thread local storage (TLS); TLS-based algorithms for array reduction and results accumulation (farm patterns, stream reductions).
10/05/2017 Short project intro – data stream analysis, time-based and sample based stream models, window based approaches.
15/05/2017 OpenCL OpenCL intro and examples: framework goal (portability, widespdread adoption), design concepts and programming abstractions: Devices/host interaction, context, kernel, command queues; execution model; memory spaces in OpenCL; C/C++ subset for kernels, kernel compilation, program objects, memory objects and kernel arguments, execution, kernel instances and workgroups, workgroup synchronization; portability and chances for load balancing: mapping OpenCL code onto both the GPU and the CPU; examples of vector types and vector operations; basic example of OpenCL program construction (vector addition).
17/05/2017 Project intro – Introduction to available project topics. Stream mining with TBB / MPI / FastFlow; stream based computations on GPU : stream computation of aggregate / accumulation functions in window and pane-based models.
19/05/2017 Lab Time – K-means algorithm in MPI, porting to TBB
OpenCL OpenCL 2.2: OpenCL C, C++ (static subset of the C++14 standard). Features missing wrt the C++14. The SYCL model for single-source OpenCL / parallel C++ code. SPIR-V as a common representation of code that allows the integration of existing technology into a unified toolchain (LLVM-based compilers, GLSL, device drivers, format translators). Integration of OpenCL C++ within the SPIR-V SW ecosystem, convergence with Vulkan.
22/05/2017 TBB Lab time K-Means algorithm development. – Parallel algorithms A selection of parallel algorithms based on implicit tree expansion - combinatorial exploration algorithms (example: N-queens), Divide and Conquer, Branch and Bound optimization methods. Interaction among different parallel visit orders, computational grain size and available parallelism. Impact of inter-worker synchronization in the B&B case.
25/05/2017 Parallel B&B, parallel D&C. Different parallelism exploitation at different tree levels. An example of D&C algorithm in Data Mining : parallelisation options for the C4.5 algorithm mining classification trees. OpenCL – Lab Time: OpenCL Linux installation.
26/05/2017 OpenCL Lab Time: Different implementations of 2d matrix multiplication algorithms (exploiting 2D and 1D work item distributions with 0D and 1D work items, global and local memory, local synchronizations among thread groups, access patterns).
29/05/2017 The Flowshop problem as an example of parallelizable B&B problem; parallel implementation choices with TBB/MPI.
30/05/2017 Stream computation of aggregate measures: the General Incremental Sliding-Window Aggregation algorithm and its parallelization.

Slides, Notes and References to papers

Date	Slides	Notes	References / Info
20/02, 22/02	Course introduction
22/02, 27/02	MPI Lesson 1
27/02, 01/03	MPI Lesson 2
01/03, 06/03	MPI Lesson 3 MPI Lesson 4
06/03	MPI Lesson 5
08/03, 13/03	MPI Lab slides
16/03	MPI Lesson 6
20/03
22/03
26/04	GPU and GPGPU intro
03/05
08/05
15/05
17/05
19/05
22/05
25/05
26/05
29/05
30/05