The C++17 standard added the concept of parallel algorithms to the C++ Standard Library. These are additional overloads of many of the functions that operate on ranges. The parallel versions have the same signature as the �normal� single-threaded versions, except for the addition of a new first parameter, which specifies the execution policy to use.
The execution policies as defined in C++17 are: * std::execution::seq * std::execution::par * std::execution::par_unseq
- cppreference alogrithm
- intel gettint started with pstl
- n3554 github, NVidia
- opencl based SyclParallelSTL
- parallel algorithm performance
- msdn cpp17 paralgo
- cpp17 parallelized algo
Lattice representation of policies, as in C++17 par_unseq / \ / \ / \ seq par
This is a serial execution, not a policy for parallelism. This has same algorithmic complexity as standard/normal implementation. It is executed in calling thread and in the order specified (definite order rather interleaved).
This policy provides for parallel execution across threads, #threads is defined by implementation. This may execute in main thread or in background threads. Any such invocations executing in the same thread are indeterminately sequenced with respect to each other.
This policy provides for max scope for parallelizing algorithm. The invocations are permitted to execute in an unordered fashion in unspecified threads, and unsequenced with respect to one another within each thread The execution order is not defined as execution could be interleaved. Basically it allows for parallel+vectorization
During the execution of a parallel algorithm with any of these execution policies, if the invocation of an element access function exits via an uncaught exception, std::terminate is called, but the implementations may define additional execution policies that handle exceptions differently.
Basically, this is where the behavior diverges with respect to pre-C++17 algorithms.
An an example this simple loop could be parallelized to do some operation:
this is lot more verbose than where we started, but it allows for parallelization without much fuss. Also it is easy to run this serially by changing
std::execution::seq, keeping the exceptoin behavior the same (as noted above). I kind of like
Intel TBB better in this case:
From implementation perspective, I think almost all will provide a thread-pool based execution implementation. At least thats what MS has done.
Line#11 is where the work is submitted to a common thread-pool. At Line#23
_Run_chunked_parallel_work does the work of populating work in ThreadPool thru
for_each perspective Line#40-Line#50 contains all the logic to convert a
for_each iteration to #tasks for parallelization.
I am quite curious when would MS throw
_Parallelism_resources_exhausted. Perhaps needs more digging, for later..