OpenSource For You

A performanc­e graph: dynamic vs static schedules

-

This benchmark is performed with no chunk size specified to the schedule clause. For the static type of schedule, the chunk size will be calculated automatica­lly with equal distributi­on in mind. For the dynamic type, if the chunk size is not specified, it is one—so each thread will be provided with one iteration of the for loop to process, and after completion it can request a new iteration to process. In NUMA systems, iterations processed closer to the allocated memory location will complete faster, as compared to distant iteration processing nodes, due to memory access latencies. The static schedule will load all specified threads immediatel­y with calculated chunk sizes. Up to 16 threads, there is a significan­t difference between the performanc­e of dynamic and static schedules, but after that the difference vanishes. The use of dynamic scheduling increases performanc­e for a lower number of threads. OpenMP behaviour is dynamic; you must tune parameters according to the underlying system to get the best performanc­e. Idling threads are of no use at all; try to load threads equally, so that they complete their job at the same time. Avoid forking/joining of threads at every parallel construct; reuse invoked threads. Play with code and observe the performanc­e to obtain the best results. Parallel code is very hard to debug, as it silently produces parallel code.

 ??  ??

Newspapers in English

Newspapers from India