magistraleinformaticanetworking:spd:2016:tbblab
Differenze
Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
Prossima revisione | Revisione precedente | ||
magistraleinformaticanetworking:spd:2016:tbblab [06/05/2016 alle 08:00 (9 anni fa)] – creata Massimo Coppola | magistraleinformaticanetworking:spd:2016:tbblab [09/05/2016 alle 10:12 (9 anni fa)] (versione attuale) – Massimo Coppola | ||
---|---|---|---|
Linea 2: | Linea 2: | ||
Lab time 06/05/2016 | Lab time 06/05/2016 | ||
+ | |||
+ | === Practical info === | ||
+ | |||
+ | * You can download current TBB on ottavinareale with | ||
+ | '' | ||
+ | * Install TBB in your home directory on ottavinareale by unpacking the archive | ||
+ | * Set up your install dir of TBB inside the tbbvars.sh script or tbbvars.csh (depending on your shell) | ||
+ | * call '' | ||
+ | |||
=== Simple " | === Simple " | ||
Linea 10: | Linea 19: | ||
The first exaple will not really be a farm: generate the input stream of points with a parallel for (either two nested or a single one with a 2D range). You will need to create appropriate range(s) for that. | The first exaple will not really be a farm: generate the input stream of points with a parallel for (either two nested or a single one with a 2D range). You will need to create appropriate range(s) for that. | ||
- | The computation for each point is actually given by the mandelbrot function, which returns the number of iteration until the point diverges, or __MaxI__ | + | The computation for each point is actually given by the mandelbrot function, which returns the number of iteration until the point diverges, or // |
- | Choose a plane region near the border of the mandelbrot set and getting | + | Choose a plane region near the border of the mandelbrot set so that you get both some of the points of the set and some outside in your computation. |
* measure the execution time and the speedup which varies with the iteration limit parameter; | * measure the execution time and the speedup which varies with the iteration limit parameter; | ||
- | * check what is the amount of parallelism | + | * check what is the amount of parallelism |
* check if the load is balanced | * check if the load is balanced | ||
+ | * check if the computation lenght per task is balanced (e.g. measure the amount of tasks that reach //MaxI//, or even better compute a cumulative | ||
* check if you need to tune the grain size in order to achieve a good speedup when //MaxI// is low. | * check if you need to tune the grain size in order to achieve a good speedup when //MaxI// is low. | ||
- | * try choosing a different | + | * try manually |
+ | |||
+ | Can you dynamically optimize the grain according to the input parameters? | ||
+ | |||
+ | == Things to do == | ||
+ | * decide how you will provide the sequential Mandelbrot function as loop body (via lambda or via a loop body class with () operator) | ||
+ | * decide if you want to set up two nested parallel_for loops or a single loop with a 2D range (it is useful to try both and compare) | ||
+ | * decide how to perform a rough check of the balancing of the load per task (histogram, percentage of lengthy tasks...) | ||
+ | |||
+ | == Useful places around the Mandelbrot set == | ||
+ | ^ X,Y = square center ^ R = square size ^ | ||
+ | | X = -0.7463 Y = 0.1102 | R = 0.005 | | ||
+ | |||
+ | === Actual farm with Mandelbrot === | ||
+ | |||
+ | Restructure the program to work with a stream of points (or stream of sets of points) that are generated and enter a parallel_do; | ||
+ | * an input parameter // | ||
+ | * the input of the sequential function contains | ||
+ | * the original point (either as a couple of indexes or as a complex number) | ||
+ | * the current coordinates (as a complex number) | ||
+ | * the current interation number | ||
+ | * the function each time computes up to // | ||
+ | * the output of the sequential function is the same as its input (better define some data structure) | ||
+ | |||
+ | You can compute on points in the stream and recycle them if the computation takes too long. Use the parallel_do methods to reinsert in the loop those points that are not completed, and let those that are completed flow out of the do_loop. | ||
- | Can you dinamycally optimize | + | * does this require a parallel_do that generates new tasks? |
+ | * Basic use of parallel do implies | ||
+ | * examine two solutions: | ||
+ | - the new tasks are inserted from within the loop itself, requiring a specific kind of parallel_do | ||
+ | - the new tasks are generated outside of the loop; how can you manage the synchronization between the loop and the task generators in order to prevent the loop from exiting when new tasks are about to be added? | ||
- | === Actual farm with mandelbrot === | + | == Extensions |
+ | If you consider the computation can have a specific grain size (size of the sets of points that are provided to each parallel_do inner function) the approach lends itself to become a sort of divide and conquer, where the long computation of some points are considered "big tasks" that are fed back in input after some time. These tasks can be repacked to allow completed points in the same task to not uselessly flow back into the loop. Proper design of the task structure minimizes data copying in this phase too. | ||
magistraleinformaticanetworking/spd/2016/tbblab.1462521638.txt.gz · Ultima modifica: 06/05/2016 alle 08:00 (9 anni fa) da Massimo Coppola