The Guide provides general advice on multithreaded performance. Hardware-specific optimizations have deliberately been kept to a minimum. In future versions of the Guide, topics covering hardware-specific optimizations will be added for developers willing to sacrifice portability for higher performance.
Application Threading:
This chapter covers general topics in parallel performance but occasionally refers to API-specific issues.
- Predicting and Measuring Parallel Performance
- Loop Modifications to Enhance Data-Parallel Performance
- Granularity and Parallel Performance
- Load Balance and Parallel Performance
- Expose Parallelism by Avoiding or Removing Artificial Dependencies
- Using Tasks Instead of Threads
- Exploiting Data Parallelism in Ordered Data Streams
Synchronization
The topics in this chapter discuss techniques to mitigate the negative impact of synchronization on performance.
- Managing Lock Contention- Large and Small Critical Sections
- Use Synchronization Routines Provided by the Threading API Rather than Hand-Coded Synchronization
- Choosing Appropriate Synchronization Primitives to Minimize Overhead
- Use Non-blocking Locks When Possible
Memory Management
Threads add another dimension to memory management that should not be ignored. This chapter covers memory issues that are unique to multithreaded applications.
- Avoiding Heap Contention Among Threads
- Use Thread-local Storage to Reduce Synchronization
- Detecting Memory Bandwidth Saturation in Threaded Applications
- Avoiding and Identifying False Sharing Among Threads
Programming Tools
This chapter describes how to use Intel software products to develop, debug, and optimize multithreaded applications. = Automatic Parallelization with Intel® Compilers = Parallelism in the Intel® Math Kernel Library = Threading and Intel® Integrated Performance Primitives