Entrambe le parti precedenti la revisioneRevisione precedenteProssima revisione | Revisione precedente |
magistraleinformaticanetworking:spd:lezioni15.16 [13/05/2016 alle 15:19 (9 anni fa)] – [Slides, Notes and References to papers] Massimo Coppola | magistraleinformaticanetworking:spd:lezioni15.16 [30/07/2016 alle 22:42 (9 anni fa)] (versione attuale) – Massimo Coppola |
---|
* 02/03/2016 **MPI** : MPI datatypes (semantics, typemap and type signature, matching rules for communication, role in MPI-performed packing and unpacking); core primitives for datatype creation (''MPI_Type_*'' : contiguous, vector, hvector, indexed, hindexed, struct, commit, free) and examples. | * 02/03/2016 **MPI** : MPI datatypes (semantics, typemap and type signature, matching rules for communication, role in MPI-performed packing and unpacking); core primitives for datatype creation (''MPI_Type_*'' : contiguous, vector, hvector, indexed, hindexed, struct, commit, free) and examples. |
* 09/03/2016 **MPI** : point to point communication modes (MPI_BSEND, MPI_SSEND; MPI_RSend usage); non-blocking communication (Wait and Test group of primitives, semantics, ''MPI_Request object'' handles to active requests); canceling and testing cancellation of non-blocking primitives (issues and pitfalls, interaction with MPI implementation, e.g. MPI_finalize); communicators and groups (communicator design aim and programming abstraction, local and global information, attributes and virtual topologies, groups as local objects, primitives for locally creating and managing groups); intracommunicators (basic primitives concerning size, rank, comparison); communicator creation as a collective operation, MPI_Comm_create basic and general case. | * 09/03/2016 **MPI** : point to point communication modes (MPI_BSEND, MPI_SSEND; MPI_RSend usage); non-blocking communication (Wait and Test group of primitives, semantics, ''MPI_Request object'' handles to active requests); canceling and testing cancellation of non-blocking primitives (issues and pitfalls, interaction with MPI implementation, e.g. MPI_finalize); communicators and groups (communicator design aim and programming abstraction, local and global information, attributes and virtual topologies, groups as local objects, primitives for locally creating and managing groups); intracommunicators (basic primitives concerning size, rank, comparison); communicator creation as a collective operation, MPI_Comm_create basic and general case. |
* 11/03/2016 **MPI LAB** | * 11/03/2016 **MPI LAB** Writing structured MPI programs, point to point communications (ping pong, token ring and variants) and how to write reusable code by exploiting communicators. |
* 16/03/2016 **MPI** MPI_Comm_split; collective communications (definition and semantics, execution environment, basic features, agreement of key parameters among the processes, constraints on Datatypes and typemaps for collective op.s, overall serialization vs synchronization, potential deadlocks); taxonomy of MPI collectives (blocking/non-blocking, synchronization/communication/communication+computation, asymmetry of the communication pattern, variable size versions, all- versions); MPI_IN_PLACE and collective op.s.; basic blocking collective operations (barrier, broadcast, gather/gatherV, scatter/scatterV, allgather, alltoall/alltoallv). | * 16/03/2016 **MPI** MPI_Comm_split; collective communications (definition and semantics, execution environment, basic features, agreement of key parameters among the processes, constraints on Datatypes and typemaps for collective op.s, overall serialization vs synchronization, potential deadlocks); taxonomy of MPI collectives (blocking/non-blocking, synchronization/communication/communication+computation, asymmetry of the communication pattern, variable size versions, all- versions); MPI_IN_PLACE and collective op.s.; basic blocking collective operations (barrier, broadcast, gather/gatherV, scatter/scatterV, allgather, alltoall/alltoallv). |
* 18/03/2016 **MPI LAB** | * 18/03/2016 **MPI LAB** Examples with derived datatypes (indexed, vector and their combinations). |
* 23/03/2016 **MPI** Composed Datatype memory layout: explicitly setting and getting extent; Compute and communication collectives, MPI_Reduce, semantics; MPI Operators (arithmetic, logic, bitwise, MINLOC and MAXLOC) and their interaction with Datatypes; defining MPI custom operators via MPI_Create_op. | * 23/03/2016 **MPI** Composing Datatypes, derived datatypes memory layout: explicitly setting and getting extent; Compute and communication collectives, MPI_Reduce, semantics; MPI Operators (arithmetic, logic, bitwise, MINLOC and MAXLOC) and their interaction with Datatypes; defining MPI custom operators via MPI_Create_op. |
* 06/04/2016 **MPI LAB** | * 06/04/2016 **MPI LAB** Design and implementation of a simple farm skeleton in MPI. Reusability and separation of concerns in MPI: exploiting communicators for skeleton and inside skeleton implementation; simple and muliple buffering; different communication primitives (Synch/Buffered and Blocking/non blocking) wrt farm load distributions strategies: Round Robin, Job request, implicit job request with double buffering. |
* 08/04/2016 **TBB (Thread Building Blocks)** TBB C++ template library overview: purpose, abstraction mechanisms and implementation layers (templates, runtime, supported abstractions, use of C++ concepts); tasks vs threads and parallel patterns hierarchical composability; parallel_for, ranges and partitioners; task partitioning and scheduling, grain size and affinity; quick note on the use of lambda expression, containers and mutexes. | * 08/04/2016 **TBB (Thread Building Blocks)** TBB C++ template library overview: purpose, abstraction mechanisms and implementation layers (templates, runtime, supported abstractions, use of C++ concepts); tasks vs threads and parallel patterns hierarchical composability; parallel_for, ranges and partitioners; task partitioning and scheduling, grain size and affinity; quick note on the use of lambda expression, containers and mutexes. |
* 13/04/2016 **TBB** TBB basic C++ concepts and algorithms (i.e. parallel skeletons). Binary splittables, range concept and blocked ranges, proportional split; parallel_for_each, parallel for; passing arguments to parallel algorithms (lamba functions vs body classes), optional arguments; parallel for 1D simplified syntax; partitioners. | * 13/04/2016 **TBB** TBB basic C++ concepts and algorithms (i.e. parallel skeletons). Binary splittables, range concept and blocked ranges, proportional split; parallel_for_each, parallel for; passing arguments to parallel algorithms (lamba functions vs body classes), optional arguments; parallel for 1D simplified syntax; partitioners. |
* 22/04/2016 **TBB** reduce (differences between “functional” and “imperative” forms); deterministic reduce; pipeline class and filter class (i.e. stages), strongly typed parallel_pipeline and make_filter template. TBB containers: similarity and differences with STL containers, mutithreaded/sequential performance tradeoffs wrt software lockout, space and time overheads, relaxed/restricted semantics and feature drops, thread view consistency; container_range, extending containers to ranges, concurrent map and set templates: concurrent_hash, unordered, unordered_multi map; concurrent and unordered set. | * 22/04/2016 **TBB** reduce (differences between “functional” and “imperative” forms); deterministic reduce; pipeline class and filter class (i.e. stages), strongly typed parallel_pipeline and make_filter template. TBB containers: similarity and differences with STL containers, mutithreaded/sequential performance tradeoffs wrt software lockout, space and time overheads, relaxed/restricted semantics and feature drops, thread view consistency; container_range, extending containers to ranges, concurrent map and set templates: concurrent_hash, unordered, unordered_multi map; concurrent and unordered set. |
* 27/04/2016 **TBB** Containers: concurrent queue, bounded_queue, priority queue; concurrent vector; thread local storage; C+11 style atomics. | * 27/04/2016 **TBB** Containers: concurrent queue, bounded_queue, priority queue; concurrent vector; thread local storage; C+11 style atomics. |
* 29/04/2016 **MPI Lab** (3 hours) | * 29/04/2016 **MPI Lab** (3 hours) Mandelbrot set computation in MPI: farm optimization and tuning. |
* 04/05/2016 **OpenCL introduction** GPGPU and OpenCL. Development history of modern GPUs, graphic pipeline, HW/FW implementations, load unbalance related to the distribution of graphic primitives executed, more “general purpose” and programmable core design; generic constraints and optimizations of the GPU approach; modern GPU architecture, memory optimization and constraints, memory spaces. GPGPU, and transition to explicitly general purpose programming languages for GPU. OpenCL intro and examples: framework goal (portability, widespdread adoption), design concepts and programming abstractions (Devices/host interaction, kernels, queues). | * 04/05/2016 **OpenCL introduction** GPGPU and OpenCL. Development history of modern GPUs, graphic pipeline, HW/FW implementations, load unbalance related to the distribution of graphic primitives executed, more “general purpose” and programmable core design; generic constraints and optimizations of the GPU approach; modern GPU architecture, memory optimization and constraints, memory spaces. GPGPU, and transition to explicitly general purpose programming languages for GPU. OpenCL intro and examples: framework goal (portability, widespdread adoption), design concepts and programming abstractions (Devices/host interaction, kernels, queues). |
* 06/05/2016 **TBB Lab** (3 hours) [[magistraleinformaticanetworking:spd:2016:TBBlab]] | * 06/05/2016 **TBB Lab** (3 hours) [[magistraleinformaticanetworking:spd:2016:TBBlab]] Farm-like implementation of the Mandelbrot set computation with TBB. Simple farm implementation, granularity and load balance with a high variance load; farm with partial computation and recycling of task continuation, extension to divide and conquer. |
* 09/05/2016 **OpenCL** design concepts and programming abstractions: Devices/host interaction, context, kernel, command queues; execution model; memory spaces in OpenCL; C/C++ subset for kernels, kernel compilation, program objects, memory objects and kernel arguments, execution, kernel instances and workgroups, workgroup synchronization; portability and chances for load balancing: mapping OpenCL code onto both the GPU and the CPU; examples of vector types and vector operations; basic example of OpenCL program construction. | * 09/05/2016 **OpenCL** design concepts and programming abstractions: Devices/host interaction, context, kernel, command queues; execution model; memory spaces in OpenCL; C/C++ subset for kernels, kernel compilation, program objects, memory objects and kernel arguments, execution, kernel instances and workgroups, workgroup synchronization; portability and chances for load balancing: mapping OpenCL code onto both the GPU and the CPU; examples of vector types and vector operations; basic example of OpenCL program construction. |
* <del>11/05/2016</del> moved to 09/05/2016 | * <del>11/05/2016</del> moved to 09/05/2016 |
* 13/05/2016 **OpenCL** (3 hours) | * 13/05/2016 **OpenCL** (3 hours) Writing, benchmark, analysis and optimization of an OpenCL program performing matrix multiplication: different approaches to data and control-flow decomposition and mapping intermediate results to different memory spaces (private/local/global memory and host memory), comparison with an equivalent CPU program; creating a trivial kernel with initialization and host code, choosing the data decomposition strategy (1D vs 2D) and its parameters; reduce pattern in OpenCL: work-item synchronization for parallel reduction, choosing a different computation pattern to reduce synchronizations (barriers). |
* 18/05/2016 **OpenCL** | * 18/05/2016 **OpenCL** OpenCL event generation and handling (event barriers) for inter-queue, non local synchronization. OpenCL 2.0 and 2.1 features: shared virtual memory (GPU/CPU memory space overlaying); nested kernels and recursive parallelism without host/device interaction; generic address space as a tool to avoid source code duplication; C11 atomics in OpenCL 2.0; pipes. |
* 20/05/2016 (3 hours) | * 20/05/2016 (3 hours) **MPI / TBB Lab time** MPI / TBB implementation of the K-means clustering algorithm. |
* 23/05/2016 | * 20/05/2016 (1 hours) **OpenCL** OpenCL 2.1: moving towards the use of a proper subset of C++14 for kernels (e.g. templates, overloading, lambda f.), allowing single-source OpenCL and non-OpenCL programming (SYCL) and providing a more homogeneous and organized semantics. The SPIR-V interoperable, symbolic GPU machine code representation and its use in the LLVM based development toolset; semplification and restructuring of the OpenCL support and its evolution with the introduction of the Vulkan layer. |
| * 23/05/2016 **Project discussion** |
| |
====Slides, Notes and References to papers==== | ====Slides, Notes and References to papers==== |
| 06/05 | [[magistraleinformaticanetworking:spd:2016:TBBlab]]| | | | | 06/05 | [[magistraleinformaticanetworking:spd:2016:TBBlab]]| | | |
| 18/05 | {{:magistraleinformaticanetworking:spd:2016:2013_khronos_2.0_opencl_overview.pdf|Intro to OpenCL2.0}} [[https://www.khronos.org/assets/uploads/developers/library/overview/opencl_overview.pdf|OpenCL 2.1 presentation]] | Download OpenCL 2.1 slides from Khronos's web site | Also check [[https://www.khronos.org/files/opencl21-reference-guide.pdf|OpenCl 2.1 Quick reference card]] and other material concerning OpenCL 2.1 from [[https://www.khronos.org/registry/cl/|Khronos OpenCL registry]] | | | 18/05 | {{:magistraleinformaticanetworking:spd:2016:2013_khronos_2.0_opencl_overview.pdf|Intro to OpenCL2.0}} [[https://www.khronos.org/assets/uploads/developers/library/overview/opencl_overview.pdf|OpenCL 2.1 presentation]] | Download OpenCL 2.1 slides from Khronos's web site | Also check [[https://www.khronos.org/files/opencl21-reference-guide.pdf|OpenCl 2.1 Quick reference card]] and other material concerning OpenCL 2.1 from [[https://www.khronos.org/registry/cl/|Khronos OpenCL registry]] | |
| | 20/05 | {{:magistraleinformaticanetworking:spd:2016:iwocl-2016-opencl-state-union.pdf|OpenCL evolution and Vulkan}} | | {{:magistraleinformaticanetworking:spd:2016:k-means3.tgz|Sequential reference code for K-means}} {{:magistraleinformaticanetworking:spd:spd13-14-paralleldatamining_notes_ch2_3.pdf|Introductory notes about Data Mining}} {{:magistraleinformaticanetworking:spd:spd11-12-dhillon-modha-corretto_parkmeans.ps|Dhillon and Modha TR on K-means}} | |
| |