Measuring Communication and Computation Overlap

Semantics of nonblocking collective operations enables you to run inter-process communication in the background while performing computations. However, the actual overlap depends on the particular MPI library implementation. You can measure a potential overlap of communication and computation using IMB-NBC benchmarks. The general benchmark flow is as follows:

  1. Measure the time needed for a pure communication call.
  2. Start a nonblocking collective operation.
  3. Start computation using the IMB_cpu_exploit function, as described in the IMB-IO Nonblocking Benchmarks chapter. To ensure correct measurement conditions, the computation time used by the benchmark is close to the pure communication time measured at step 1.     
  4. Wait for communication to finish using the MPI_Wait function.

Displaying Results

The timing values to interpret the overlap potential are as follows:

Since different processes in a collective operation may have different execution times, the timing values are taken for the process with the biggest t_ovrl execution time. The IMB-NBC result tables report the timings t_ovrl, t_pure, t_CPU and the estimated overlap in percent calculated by the following formula:

overlap = 100.*max(0,min(1, (t_pure+t_CPU-t_ovrl) / min(t_pure, t_CPU))

See Also

IMB-NBC Benchmarks
Measuring Pure Communication Time