Command-line Control

You can control all the aspects of the Intel® MPI Benchmarks through the command-line. The general command-line syntax is the following:

IMB-MPI1    [-h{elp}]
            [-npmin     <P_min>]
            [-multi     <outflag>]
            [-off_cache <cache_size[,cache_line_size]>
            [-iter      <msgspersample[,overall_vol[,msgs_nonaggr[,iter_policy]]]>]
            [-iter_policy     <iter_policy>]
            [-time     <max_runtime per sample>]
            [-mem       <max. mem usage per process>]
            [-msglen    <Lengths_file>]
            [-map       <PxQ>]
            [-input     <filename>]
            [-include]  [benchmark1 [,benchmark2 [,...]]]
            [-exclude]  [benchmark1 [,benchmark2 [,...]]]
            [-msglog [<minlog>:]<maxlog>]
            [-thread_level <level>]
            [-sync <mode>]
            [-root_shift <mode>]
            [benchmark1 [,benchmark2 [,...]]]

The command line is repeated in the output. The options may appear in any order.

Examples:

Get out-of-cache data for PingPong:

mpirun -np 2  IMB-MPI1 PingPong -off_cache -1

Run a very large configuration, with the following restrictions:

mpirun -np 512 IMB-MPI1 -npmin 512 alltoallv -iter 20 -time 1.5 -mem 2

Run the P_Read_shared benchmark with the minimum number of processes set to seven:

mpirun -np 14 IMB-IO P_Read_shared -npmin 7

Run the IMB-MPI1 benchmarks including PingPongSpecificSource and PingPingSpecificSource, but excluding the Alltoall and Alltoallv benchmarks. Set the transfer message sizes as 0, 4, 8, 16, 32, 64, 128:

mpirun -np 16 IMB-MPI1 -msglog 2:7 -include PingPongSpecificsource PingPingSpecificsource -exclude Alltoall Alltoallv

Run the PingPong, PingPing, PingPongSpecificSource and PingPingSpecificSource benchmarks with the transfer message sizes 0, 2^0, 2^1, 2^2, ..., 2^16:

mpirun -np 4 IMB-MPI1 -msglog 16 PingPong PingPing PingPongSpecificSource PingPingSpecificSource

Benchmark Selection Arguments

Benchmark selection arguments are a sequence of blank-separated strings. Each string is the name of a benchmark in exact spelling, case insensitive.

For example, the string IMB-MPI1 PingPong Allreduce specifies that you want to run PingPong and Allreduce benchmarks only:

mpirun -np 10 IMB-MPI1 PingPong Allreduce

By default, all benchmarks of the selected component are run.

-npmin Option

Specifies the minimum number of processes P_min to run all selected benchmarks on. The P_min value after -npmin must be an integer.

Given P_min, the benchmarks run on the processes with the numbers selected as follows:

P_min, 2P_min, 4P_min, ..., largest 2xP_min <P, P

Note

You may set P_min to 1. If you set P_min > P, Intel MPI Benchmarks interprets this value as P_min = P.

For example, to run the IMB-EXT benchmarks with minimum number of processes set to five, call:

mpirun -np 11 IMB-EXT -npmin 5

By default, all active processes are selected as described in the Running Intel® MPI Benchmarks section.

-multi Option

Defines whether the benchmark runs in multiple mode. In this mode MPI_COMM_WORLD is split into several groups, which run simultaneously. The argument after -multi is a meta-symbol <outflag> that can take an integer value of 0 or 1:

When the number of processes running the benchmark is more than half of the overall number MPI_COMM_WORLD, the multiple benchmark coincides with the non-multiple one, as not more than one process group can be created.

For example, if you run this command:

mpirun -np 16 IMB-MPI1 -multi 0 bcast -npmin 12

The benchmark will run in non-multiple mode, as the benchmarking starts from 12 processes, which is more than half of MPI_COMM_WORLD.

By default, Intel® MPI Benchmarks run non-multiple benchmark flavors.

-off_cache cache_size[,cache_line_size] Option

Use the -off_cache flag to avoid cache re-use. If you do not use this flag (default), the same communications buffer is used for all repetitions of one message size sample. In this case, Intel® MPI Benchmarks reuses the cache, so throughput results might be non-realistic.

The argument after off_cache can be a single number (cache_size), two comma-separated numbers (cache_size,cache_line_size), or -1:

The sent/received data is stored in buffers of size ~2x MAX(cache_size, message_size). When repetitively using messages of a particular size, their addresses are advanced within those buffers so that a single message is at least 2 cache lines after the end of the previous message. When these buffers are filled up, they are reused from the beginning.

-off_cache is effective for IMB-MPI1 and IMB-EXT. Avoid using this option for IMB-IO.

Examples:

Use the default values defined in IMB_mem_info.h:

-off_cache -1

2.5 MB last level cache, default line size:

-off_cache 2.5

16 MB last level cache, line size 128:

-off_cache 16,128

The off_cache mode might also be influenced by eventual internal caching with the Intel® MPI Library. This could make results interpretation complicated.

Default: no cache control.

-iter Option

Use this option to control the number of iterations executed by every benchmark.

By default, the number of iterations is controlled through parameters MSGSPERSAMPLE, OVERALL_VOL, MSGS_NONAGGR, and ITER_POLICY defined in IMB_settings.h.

You can optionally add one or more arguments after the -iter flag, to override the default values defined in IMB_settings.h. Use the following guidelines for the optional arguments:

Examples:

To define MSGSPERSAMPLE as 2000, and OVERALL_VOL as 100, use the following string:

-iter 2000,100

To define MSGS_NONAGGR as 150, you need to define values for MSGSPERSAMPLE and OVERALL_VOL as shown in the following string:

-iter 1000,40,150

To define MSGSPERSAMPLE as 2000 and set the multiple_np policy, use the following string (see -iter_policy):

-iter 2000,multiple_np

-iter_policy Option

Use this option to set a policy for automatic calculation of the number of iterations. Use one of the following arguments to override the default ITER_POLICY value defined in IMB_settings.h:

Policy

Description

dynamic

Reduces the number of iterations when the maximum run time per sample (see -time) is expected to be reached. Using this policy ensures faster execution, but may lead to inaccuracy of the results.

multiple_np

Reduces the number of iterations when the message size is getting bigger. Using this policy ensures the accuracy of the results, but may lead to longer execution time. You can control the execution time through the -time option.

auto

Automatically chooses which policy to use:

  • applies multiple_np to collective operations where one of the processes acts as the root of the operation (for example, MPI_Bcast)
  • applies dynamic to all other types of operations

off

The number of iterations does not change during the execution.

You can also set the policy through the -iter option. See -iter.

By default, the ITER_POLICY defined in IMB_settings.h is used.

-time Option

Specifies the number of seconds for the benchmark to run per message size. The argument after -time is a floating-point number.

The combination of this flag with the -iter flag or its default alternative ensures that the Intel® MPI Benchmarks always chooses the maximum number of repetitions that conform to all restrictions.

A rough number of repetitions per sample to fulfill the -time request is estimated in preparatory runs that use ~1 second overhead.

Default: -time is activated. The floating-point value specifying the run-time seconds per sample is set in the SECS_PER_SAMPLE variable defined in IMB_settings.h, or IMB_settings_io.h.

-mem Option

Specifies the number of GB to be allocated per process for the message buffers. If the size is exceeded, a warning is returned, stating how much memory is required for the overall run.

The argument after -mem is a floating-point number.

Default: the memory is restricted by MAX_MEM_USAGE defined in IMB_mem_info.h.

-input <File> Option

Use the ASCII input file to select the benchmarks. For example, the IMB_SELECT_EXT file looks as follows:

#
# IMB benchmark selection file
#
# Every line must be a comment (beginning with #), or it
# must contain exactly one IMB benchmark name
#
#Window
Unidir_Get
#Unidir_Put
#Bidir_Get
#Bidir_Put
Accumulate

With the help of this file, the following command runs only Unidir_Get and Accumulate benchmarks of the IMB-EXT component:

mpirun .... IMB-EXT -input IMB_SELECT_EXT

-msglen <File> Option

Enter any set of non-negative message lengths to an ASCII file, line by line, and call the Intel® MPI Benchmarks with arguments:

-msglen Lengths

The Lengths value overrides the default message lengths. For IMB-IO, the file defines the I/O portion lengths.

-map PxQ Option

Use this option to number the processes along rows of the matrix:

0

P

...

(Q-2)P

(Q-1)P

1

 

 

 

 

...

 

 

 

 

P-1

2P-1

 

(Q-1)P-1

QP-1

For example, to run Multi-PingPongbetween two nodes of size P, with each process on one node communicating with its counterpart on the other, call:

mpirun -np <2P> IMB-MPI1 -map <P>x2 PingPong

-include [[benchmark1] benchmark2 ...]

Specifies the list of additional benchmarks to run. For example, to add PingPongSpecificSource and PingPingSpecificSource benchmarks, call:

mpirun -np 2 IMB-MPI1 -include PingPongSpecificSource PingPingSpecificSource

-exclude [[benchmark1] benchmark2 ...]

Specifies the list of benchmarks to be excluded from the run. For example, to exclude Alltoall and Allgather, call:

mpirun -np 2 IMB-MPI1 -exclude Alltoall Allgather

-msglog [<minlog>:]<maxlog>

This option allows you to control the lengths of the transfer messages. This setting overrides the MINMSGLOG and MAXMSGLOG values. The new message sizes are 0, 2^minlog, ..., 2^maxlog.

For example, if you run the following command line:

mpirun -np 2 IMB-MPI1 -msglog 3:7 PingPong

Intel® MPI Benchmarks selects the lengths 0, 8, 16, 32, 64, 128, as shown below:

#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
       #bytes #repetitions      t[μsec]   Mbytes/sec
            0         1000         0.70         0.00
            8         1000         0.73        10.46
           16         1000         0.74        20.65
           32         1000         0.94        32.61
           64         1000         0.94        65.14
          128         1000         1.06       115.16

Alternatively, you can specify only the maxlog value, enter:

mpirun -np 2 IMB-MPI1 -msglog 3 PingPong

In this case Intel® MPI Benchmarks selects the lengths 0,1,2,4,8:

#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
       #bytes #repetitions      t[μsec]   Mbytes/sec
            0         1000         0.69         0.00
            1         1000         0.72         1.33
            2         1000         0.71         2.69
            4         1000         0.72         5.28
            8         1000         0.73        10.47

-thread_level Option

This option specifies the desired thread level for MPI_Init_thread(). See description of MPI_Init_thread() for details. The option is available only if the Intel® MPI Benchmarks is built with the USE_MPI_INIT_THREAD macro defined. Possible values for <level> are single, funneled, serialized, and multiple.

-sync Option

This option is relevant only for benchmarks measuring collective operations. It controls whether all ranks are synchronized after every iteration step by means of the MPI_Barrier operation. The -sync option can take the following arguments:

Argument

Description

0 | off | disable | no

Disables processes synchronization at each iteration step. This is the default value.

1 | on | enable | yes

Enables processes synchronization at each iteration step.

-root_shift Option

This options is relevant only for benchmarks measuring collective operations that utilize the root concept (for example MPI_Bcast, MPI_Reduce, MPI_Gather, etc). It defines whether the root is changed at every iteration step or not. The –root_shift option can take the following arguments:

Argument

Description

0 | off | disable | no

Disables root change at each iteration step. Rank 0 acts as a root at each iteration step.

1 | on | enable | yes

Enables root change at each iteration step. Root rank is changed in a round-robin fashion. This is the default value.