While we normally interact with workload managers directly that bootstrap MPI, it's helpful to know how to use vanilla MPI (with a hosts file and ssh, which is a bare bones setup). Examples for using mpirun with different mpi implementations. Note that this is just for one node (testing in a docker container) but I try to provide examples with multiple and a hosts file. Note that you'd need ssh configured properly for this.
docker run -it ghcr.io/rse-ops/lammps-matrix:intel-mpi-rocky-9-amd64
# vanilla example
mpirun -n 1 lmp -v x 2 -v y 2 -v z 2 -in ./in.reaxc.hns -nocite
# Is the same as
mpirun -np 1 lmp -v x 2 -v y 2 -v z 2 -in ./in.reaxc.hns -nocite
# For multiple hosts (nodes), use comma separated hosts on the command line
mpirun -np 1 -hosts $(hostname) lmp -v x 2 -v y 2 -v z 2 -in ./in.reaxc.hns -nocite
# or in a file with -f
hostname > hosts.txt
mpirun -np 1 -f ./hosts.txt lmp -v x 2 -v y 2 -v z 2 -in ./in.reaxc.hns -nocite
# It's also common to ask for a total number of tasks, and the processes per node (ppn)
mpirun -n {{tasks}} -ppn {{tasks_per_node}} <application>
# For example, if we want to evenly distribute 200 tasks across 4 nodes (50 per node) we would do:
# The lammps problem size is also larger, given more resources.
mpirun --hostfile ./hostlist.txt -np 200 -ppn 50 lmp -v x 64 -v y 16 -v z 16 -in in.reaxc.hns -nocite
Full help:
mpirun intel MPI
# mpirun --help
Usage: ./mpiexec [global opts] [local opts for exec1] [exec1] [exec1 args] : [local opts for exec2] [exec2] [exec2 args] : ...
Global options (passed to all executables):
Global environment options:
-genv {name} {value} environment variable name and value
-genvlist {env1,env2,...} environment variable list to pass
-genvnone do not pass any environment variables
-genvall pass all environment variables not managed
by the launcher (default)
Other global options:
-f {name} file containing the host names
-hosts {host list} comma separated host list
Local options (passed to individual executables):
Other local options:
-n/-np {value} number of processes
{exec_name} {args} executable name and arguments
Hydra specific options (treated as global):
Launch options:
-launcher launcher to use (ssh slurm rsh ll sge pbs pbsdsh pdsh srun lsf blaunch qrsh fork)
-launcher-exec executable to use to launch processes
-enable-x/-disable-x enable or disable X forwarding
Resource management kernel options:
-rmk resource management kernel to use (slurm ll lsf sge pbs cobalt)
Processor topology options:
-bind-to process binding
-map-by process mapping
-membind memory binding policy
Other Hydra options:
-verbose verbose mode
-info build information
-print-all-exitcodes print exit codes of all processes
-ppn processes per node
-prepend-rank prepend rank to output
-prepend-pattern prepend pattern to output
-outfile-pattern direct stdout to file
-errfile-pattern direct stderr to file
-nameserver name server information (host:port format)
-disable-auto-cleanup don't cleanup processes on error
-disable-hostname-propagation let MPICH auto-detect the hostname
-localhost local hostname for the launching node
-usize universe size (SYSTEM, INFINITE, <value>)
Intel(R) MPI Library specific options:
<option> -help show help message for the specific option
Global options:
-aps Intel(R) Application Performance Snapshot profile
-mps Intel(R) Application Performance Snapshot profile (MPI, OpenMP only)
-gtool tool and rank set
-gtoolfile file containing tool and rank set
-hosts-group {groups of hosts} allows to set node ranges (like in Slurm* Workload Manager)
Other Hydra options:
-iface network interface to use
-s <spec> redirect stdin to all or 1,2 or 2-4,6 MPI processes (0 by default)
-silent-abort do not print abort warning message
-nolocal avoid running the application processes on the node where mpiexec.hydra started
-tune {binary file} defines the name of binary tuning file
-print-rank-map print rank mapping
-prepend-timestamp prepend time stamp to stdout
-prot print the communication protocol between each host and process pin status
Intel(R) MPI Library, Version 2021.8 Build 20221129 (id: 339ec755a1)
Copyright 2003-2022 Intel Corporation.
docker run -it --entrypoint bash ghcr.io/rse-ops/lammps-matrix:openmpi-ubuntu-22.04-amd64
# most containers with user "root" will need --allow-run-as-root
mpirun --allow-run-as-root -n 1 lmp -v x 2 -v y 2 -v z 2 -in ./in.reaxc.hns -nocite
# this allows any of -c, -n, -np, or --np
mpirun --allow-run-as-root --np 1 lmp -v x 2 -v y 2 -v z 2 -in ./in.reaxc.hns -nocite
# But to explicitly say number of nodes
mpirun --allow-run-as-root -N 1 lmp -v x 2 -v y 2 -v z 2 -in ./in.reaxc.hns -nocite
# example controlling topology with saying "map by processes per resource, 48 per node" so 96 total, 2 nodes
mpirun -np 96 -map-by ppr:48:node --hostfile ./hostfile.txt <application>
Help for OpenMPI
# mpirun --allow-run-as-root --help
mpirun (Open MPI) 4.1.2
Usage: mpirun [OPTION]... [PROGRAM]...
Start the given program using Open RTE
-c|-np|--np <arg0> Number of processes to run
-h|--help <arg0> This help message
-n|--n <arg0> Number of processes to run
-q|--quiet Suppress helpful messages
-v|--verbose Be verbose
-V|--version Print version and exit
For additional mpirun arguments, run 'mpirun --help <category>'
The following categories exist: general (Defaults to this option), debug,
output, input, mapping, ranking, binding, devel (arguments useful to OMPI
Developers), compatibility (arguments supported for backwards compatibility),
launch (arguments to modify launch options), and dvm (Distributed Virtual
Machine arguments).
Report bugs to http://www.open-mpi.org/community/help/
# mpirun --allow-run-as-root --help mapping
mpirun (Open MPI) 4.1.2
Usage: mpirun [OPTION]... [PROGRAM]...
Start the given program using Open RTE
-cf|--cartofile <arg0>
Provide a cartography file
-cpus-per-proc|--cpus-per-proc <arg0>
Number of cpus to use for each process [default=1]
-cpus-per-rank|--cpus-per-rank <arg0>
Synonym for cpus-per-proc
-H|-host|--host <arg0> List of hosts to invoke processes on
--map-by <arg0> Mapping Policy [slot | hwthread | core | socket
(default) | numa | board | node]
-N <arg0> Launch n processes per node on all allocated nodes
(synonym for 'map-by node')
-nolocal|--nolocal Do not run any MPI applications on the local node
-nooversubscribe|--nooversubscribe
Nodes are not to be oversubscribed, even if the
system supports such operation
-oversubscribe|--oversubscribe
Nodes are allowed to be oversubscribed, even on a
managed system, and overloading of processing
elements
--ppr <arg0> Comma-separated list of number of processes on a
given resource type [default: none]
-rf|--rankfile <arg0>
Provide a rankfile file
-use-hwthread-cpus|--use-hwthread-cpus
Use hardware threads as independent cpus
# mpirun --allow-run-as-root --help launch
mpirun (Open MPI) 4.1.2
Usage: mpirun [OPTION]... [PROGRAM]...
Start the given program using Open RTE
-allow-run-as-root|--allow-run-as-root
Allow execution as root (STRONGLY DISCOURAGED)
-am <arg0> Aggregate MCA parameter set file list
--app <arg0> Provide an appfile; ignore all other command line
options
-default-hostfile|--default-hostfile <arg0>
Provide a default hostfile
-enable-instant-on-support|--enable-instant-on-support
Enable PMIx-based instant on launch support
(experimental)
-fwd-mpirun-port|--fwd-mpirun-port
Forward mpirun port to compute node daemons so all
will use it
-hostfile|--hostfile <arg0>
Provide a hostfile
-launch-agent|--launch-agent <arg0>
Command used to start processes on remote nodes
(default: orted)
-machinefile|--machinefile <arg0>
Provide a hostfile
--noprefix Disable automatic --prefix behavior
-path|--path <arg0> PATH to be used to look for executables to start
processes
-personality|--personality <arg0>
Comma-separated list of programming model,
languages, and containers being used
(default="ompi")
--prefix <arg0> Prefix where Open MPI is installed on remote nodes
--preload-files <arg0>
Preload the comma separated list of files to the
remote machines current working directory before
starting the remote process.
-s|--preload-binary Preload the binary on the remote machine before
starting the remote process.
-set-cwd-to-session-dir|--set-cwd-to-session-dir
Set the working directory of the started processes
to their session directory
-show-progress|--show-progress
Output a brief periodic report on launch progress
-use-regexp|--use-regexp
Use regular expressions for launch
-wd|--wd <arg0> Synonym for --wdir
-wdir|--wdir <arg0> Set the working directory of the started processes
-x <arg0> Export an environment variable, optionally
specifying a value (e.g., "-x foo" exports the
environment variable foo and takes its value from
the current environment; "-x foo=bar" exports the
environment variable name foo and sets its value to
"bar" in the started processes)
Report bugs to http://www.open-mpi.org/community/help/
docker run -it --entrypoint bash ghcr.io/rse-ops/lammps-matrix:mpich-ubuntu-22.04-amd64
mpirun -np 1 lmp -v x 2 -v y 2 -v z 2 -in ./in.reaxc.hns -nocite
# The remainder of commands are the same as intel MPI. See the help below for complete details.
Mpich mpirun help
# mpirun --help
Usage: ./mpiexec [global opts] [local opts for exec1] [exec1] [exec1 args] : [local opts for exec2] [exec2] [exec2 args] : ...
Global options (passed to all executables):
Global environment options:
-genv {name} {value} environment variable name and value
-genvlist {env1,env2,...} environment variable list to pass
-genvnone do not pass any environment variables
-genvall pass all environment variables not managed
by the launcher (default)
Other global options:
-f {name} file containing the host names
-hosts {host list} comma separated host list
-wdir {dirname} working directory to use
-configfile {name} config file containing MPMD launch options
Local options (passed to individual executables):
Local environment options:
-env {name} {value} environment variable name and value
-envlist {env1,env2,...} environment variable list to pass
-envnone do not pass any environment variables
-envall pass all environment variables (default)
Other local options:
-n/-np {value} number of processes
{exec_name} {args} executable name and arguments
Hydra specific options (treated as global):
Launch options:
-launcher launcher to use (ssh rsh fork slurm ll lsf sge manual persist)
-launcher-exec executable to use to launch processes
-enable-x/-disable-x enable or disable X forwarding
Resource management kernel options:
-rmk resource management kernel to use (user slurm ll lsf sge pbs cobalt)
Processor topology options:
-topolib processor topology library (hwloc)
-bind-to process binding
-map-by process mapping
-membind memory binding policy
Demux engine options:
-demux demux engine (poll select)
Other Hydra options:
-verbose verbose mode
-info build information
-print-all-exitcodes print exit codes of all processes
-iface network interface to use
-ppn processes per node
-profile turn on internal profiling
-prepend-rank prepend rank to output
-prepend-pattern prepend pattern to output
-outfile-pattern direct stdout to file
-errfile-pattern direct stderr to file
-nameserver name server information (host:port format)
-disable-auto-cleanup don't cleanup processes on error
-disable-hostname-propagation let MPICH auto-detect the hostname
-order-nodes order nodes as ascending/descending cores
-localhost local hostname for the launching node
-usize universe size (SYSTEM, INFINITE, <value>)
-pmi-port use the PMI_PORT model
-skip-launch-node do not run MPI processes on the launch node
-gpus-per-proc number of GPUs per process (default: auto)
Please see the instructions provided at
http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager
for further details
The mpirun and mpiexec commands are identical in their functionality, and are both symbolic links to orterun, which is the job launching command of IBM SpectrumMPI's underlying Open Runtime Environment. Therefore, although this material refers only to the mpirun command, all references to it are considered synonymous with the mpiexec and orterun commands.
mpirun -np 4 --hostfile ./hosts.txt <application>
mpirun -np 4 -host h1,h2,h2 <application>
Spectrum MPI help
mpirun (Open MPI) 10.03.01.03rtm0
Usage: mpirun [OPTION]... [PROGRAM]...
Start the given program using Open RTE
-allow-run-as-root|--allow-run-as-root
Allow execution as root (STRONGLY DISCOURAGED)
-am <arg0> Aggregate MCA parameter set file list
--app <arg0> Provide an appfile; ignore all other command line
options
--bind-to <arg0> Policy for binding processes. Allowed values: none,
hwthread, core, l1cache, l2cache, l3cache, socket,
numa, board, cpu-list ("none" is the default when
oversubscribed, "core" is the default when np<=2,
and "socket" is the default when np>2). Allowed
qualifiers: overload-allowed, if-supported,
ordered
-bind-to-core|--bind-to-core
Bind processes to cores
-bind-to-socket|--bind-to-socket
Bind processes to sockets
-bycore|--bycore Whether to map and rank processes round-robin by
core
-bynode|--bynode Whether to map and rank processes round-robin by
node
-byslot|--byslot Whether to map and rank processes round-robin by
slot
-c|-np|--np <arg0> Number of processes to run
-cf|--cartofile <arg0>
Provide a cartography file
-continuous|--continuous
Job is to run until explicitly terminated
-cpu-list|--cpu-list <arg0>
List of processor IDs to bind processes to
[default=NULL]
-cpu-set|--cpu-set <arg0>
Comma-separated list of ranges specifying logical
cpus allocated to this job [default: none]
-cpus-per-proc|--cpus-per-proc <arg0>
Number of cpus to use for each process [default=1]
-cpus-per-rank|--cpus-per-rank <arg0>
Synonym for cpus-per-proc
-d|-debug-devel|--debug-devel
Enable debugging of OpenRTE
-debug|--debug Invoke the user-level debugger indicated by the
orte_base_user_debugger MCA parameter
-debug-daemons|--debug-daemons
Enable debugging of any OpenRTE daemons used by
this application
-debug-daemons-file|--debug-daemons-file
Enable debugging of any OpenRTE daemons used by
this application, storing output in files
-debugger|--debugger <arg0>
Sequence of debuggers to search for when "--debug"
is used
-default-hostfile|--default-hostfile <arg0>
Provide a default hostfile
-disable-recovery|--disable-recovery
Disable recovery (resets all recovery options to
off)
-display-allocation|--display-allocation
Display the allocation being used by this job
-display-devel-allocation|--display-devel-allocation
Display a detailed list (mostly intended for
developers) of the allocation being used by this
job
-display-devel-map|--display-devel-map
Display a detailed process map (mostly intended for
developers) just before launch
-display-diffable-map|--display-diffable-map
Display a diffable process map (mostly intended for
developers) just before launch
-display-map|--display-map
Display the process map just before launch
-display-topo|--display-topo
Display the topology as part of the process map
(mostly intended for developers) just before
launch
-do-not-launch|--do-not-launch
Perform all necessary operations to prepare to
launch the application, but do not actually launch
it
-do-not-resolve|--do-not-resolve
Do not attempt to resolve interfaces
-dvm|--dvm Create a persistent distributed virtual machine
(DVM)
-enable-instant-on-support|--enable-instant-on-support
Enable PMIx-based instant on launch support
(experimental)
-enable-recovery|--enable-recovery
Enable recovery from process failure [Default =
disabled]
-fwd-mpirun-port|--fwd-mpirun-port
Forward mpirun port to compute node daemons so all
will use it
-get-stack-traces|--get-stack-traces
Get stack traces of all application procs on
timeout
-gmca|--gmca <arg0> <arg1>
Pass global MCA parameters that are applicable to
all contexts (arg0 is the parameter name; arg1 is
the parameter value)
-h|--help <arg0> This help message
-H|-host|--host <arg0> List of hosts to invoke processes on
-hnp|--hnp <arg0> Specify the URI of the HNP, or the name of the file
(specified as file:filename) that contains that
info
-hostfile|--hostfile <arg0>
Provide a hostfile
-index-argv-by-rank|--index-argv-by-rank
Uniquely index argv[0] for each process using its
rank
-launch-agent|--launch-agent <arg0>
Command used to start processes on remote nodes
(default: orted)
-leave-session-attached|--leave-session-attached
Enable debugging of OpenRTE
-machinefile|--machinefile <arg0>
Provide a hostfile
--map-by <arg0> Mapping Policy [slot | hwthread | core | socket
(default) | numa | board | node]
-max-restarts|--max-restarts <arg0>
Max number of times to restart a failed process
-max-vm-size|--max-vm-size <arg0>
Number of processes to run
-mca|--mca <arg0> <arg1>
Pass context-specific MCA parameters; they are
considered global if --gmca is not used and only
one context is specified (arg0 is the parameter
name; arg1 is the parameter value)
-merge-stderr-to-stdout|--merge-stderr-to-stdout
Merge stderr to stdout for each process
-N <arg0> Launch n processes per node on all allocated nodes
(synonym for 'map-by node')
-n|--n <arg0> Number of processes to run
-nolocal|--nolocal Do not run any MPI applications on the local node
-nooversubscribe|--nooversubscribe
Nodes are not to be oversubscribed, even if the
system supports such operation
--noprefix Disable automatic --prefix behavior
-novm|--novm Execute without creating an allocation-spanning
virtual machine (only start daemons on nodes
hosting application procs)
-npernode|--npernode <arg0>
Launch n processes per node on all allocated nodes
-npersocket|--npersocket <arg0>
Launch n processes per socket on all allocated
nodes
-ompi-server|--ompi-server <arg0>
Specify the URI of the publish/lookup server, or
the name of the file (specified as file:filename)
that contains that info
-output-filename|--output-filename <arg0>
Redirect output from application processes into
filename/job/rank/std[out,err,diag]. A relative
path value will be converted to an absolute path
-output-proctable|--output-proctable
Output the debugger proctable after launch
-oversubscribe|--oversubscribe
Nodes are allowed to be oversubscribed, even on a
managed system, and overloading of processing
elements
-path|--path <arg0> PATH to be used to look for executables to start
processes
-pernode|--pernode Launch one process per available node
-personality|--personality <arg0>
Comma-separated list of programming model,
languages, and containers being used
(default="ompi")
--ppr <arg0> Comma-separated list of number of processes on a
given resource type [default: none]
--prefix <arg0> Prefix where Open MPI is installed on remote nodes
--preload-files <arg0>
Preload the comma separated list of files to the
remote machines current working directory before
starting the remote process.
-q|--quiet Suppress helpful messages
--rank-by <arg0> Ranking Policy [slot (default) | hwthread | core |
socket | numa | board | node]
-report-bindings|--report-bindings
Whether to report process bindings to stderr
-report-child-jobs-separately|--report-child-jobs-separately
Return the exit status of the primary job only
-report-events|--report-events <arg0>
Report events to a tool listening at the specified
URI
-report-pid|--report-pid <arg0>
Printout pid on stdout [-], stderr [+], or a file
[anything else]
-report-state-on-timeout|--report-state-on-timeout
Report all job and process states upon timeout
-report-uri|--report-uri <arg0>
Printout URI on stdout [-], stderr [+], or a file
[anything else]
-rf|--rankfile <arg0>
Provide a rankfile file
-s|--preload-binary Preload the binary on the remote machine before
starting the remote process.
-set-cwd-to-session-dir|--set-cwd-to-session-dir
Set the working directory of the started processes
to their session directory
-show-progress|--show-progress
Output a brief periodic report on launch progress
-stdin|--stdin <arg0>
Specify procs to receive stdin [rank, all, none]
(default: 0, indicating rank 0)
-tag-output|--tag-output
Tag all output with [job,rank]
-timeout|--timeout <arg0>
Timeout the job after the specified number of
seconds
-timestamp-output|--timestamp-output
Timestamp all application process output
-tune <arg0> Application profile options file list
-tv|--tv Deprecated backwards compatibility flag; synonym
for "--debug"
-use-hwthread-cpus|--use-hwthread-cpus
Use hardware threads as independent cpus
-use-regexp|--use-regexp
Use regular expressions for launch
-v|--verbose Be verbose
-V|--version Print version and exit
-wd|--wd <arg0> Synonym for --wdir
-wdir|--wdir <arg0> Set the working directory of the started processes
-x <arg0> Export an environment variable, optionally
specifying a value (e.g., "-x foo" exports the
environment variable foo and takes its value from
the current environment; "-x foo=bar" exports the
environment variable name foo and sets its value to
"bar" in the started processes)
-xml|--xml Provide all output in XML format
-xml-file|--xml-file <arg0>
Provide all output in XML format to the specified
file
-xterm|--xterm <arg0>
Create a new xterm window and display output from
the specified ranks there
For additional mpirun arguments, run 'mpirun --help <category>'
The following categories exist: general (Defaults to this option), debug,
output, input, mapping, ranking, binding, devel (arguments useful to OMPI
Developers), compatibility (arguments supported for backwards compatibility),
launch (arguments to modify launch options), and dvm (Distributed Virtual
Machine arguments).
Report bugs to https://www.ibm.com/mysupport/s/
Extra options from Spectrum-MPI (that translate to similar Open MPI options):
[Container behavior]
-container rank : Use $MPIRUN_CONTAINER_CMD to launch ranks within
individual container instances. Automatically inserts
the container assistant script in front of the program
name for environment modifications.
-container all : Use $MPIRUN_CONTAINER_CMD to relaunch mpirun within
a container and to launch orteds within individual
container instances. The container assistant script is
automatically inserted in front of the relaunched
mpirun command. All ranks assigned to a node will share
the same container instance on that node with the orted.
-container orted : Use $MPIRUN_CONTAINER_CMD to launch orteds within
individual container instances. This is like 'all' mode
except mpirun does -not- relaunch iteself within a
container. The user is responsible for establishing the
container instance then launching mpirun from within
that container instance. No container assistant script
is used in this mode. As such the 'assist' and 'root'
options, and SMPI_CONTAINERENV_ prefixed environment
variables have no impact in this mode.
-container root:<dir> : By default the container is assumed to set its
own MPI_ROOT environment variable inside the container.
If this is not the case or if a different value for
MPI_ROOT is needed then this option can be used to
specify that MPI_ROOT value inside the container.
-container assist:<path> : Set the full path to the container assistant
script that is valid inside the container at runtime.
Default: $MPI_ROOT/container/bin/incontainer.pl
-container <option>,<option>,.. : Comma separated list of the above options
env MPIRUN_CONTAINER_OPTIONS=<options> : Same as -container <options>
env MPIRUN_CONTAINER_CMD=<cmd> : Specify the container runtime command to
launch a container instance. This can be any executable
including a script for ease of use.
Example: "singularity exec myapp.sif"
env SMPI_CONTAINERENV_* : Pass an environment variable from outside
of the container to inside the container by prefixing
the variable with this string.
env SMPI_CONTAINERENV_PREPEND_* : Prepend values to the beginning of an
environment variable inside of the container by
prefixing the variable name with this string.
env SMPI_CONTAINERENV_APPEND_* : Append values to the end of an environment
variable inside of the container by prefixing the
variable name with this string.
[Interconnect selection]
-PAMI / -pami : use IBM PAMI via the pami PML (default)
-UCX / -ucx : use UCX (Tech Preview) via the ucx PML
-MXM / -mxm : use Mellanox MXM via the yalla PML
-TCP / -tcp : use TCP/IP via the PML ob1 and the BTL tcp
-IBV / -ibv : use OpenFabrics infiniband via the
PML ob1 and the BTL openib
aliases: -ib / -openib
In all of the above the capital option equates to forcing the specified
PML / MTL / BTL, and the lower case option only equates to specifying a
higher priority for the selected interconnect.
[Additional PAMI options]
-verbsbypass <ver> : use PAMI's verbs bypass. <ver> reflects
Mellanox OFED version installed:
(ver=auto, off and x.y* (* installed MOFED version in the cluster ))
auto find out the installed compatible MOFED version on the mpirun node
(auto assumes complete cluster installed with same MOFED level)
-pami_noib : use PAMI on a single node with no Infiniband card. [ppc64 only]
-async : use PAMI Asyncronous progress thread [ selected pml must be pami ]
-hwtm : use IB hardware tag matching [ selected pml must be pami ]
[On-host communication method]
-intra nic : use the off-host BTL for on-host traffic as well
-intra vader : use BTL=vader (shared memory) for on-host traffic
(only applies if the PML is already ob1)
-intra shm : equivalent to -intra=vader
[Display interconnect]
-prot : display a table of what interconnect type each host uses
(first rank on each host connects to all peer hosts to
establish connections that might otherwise be on-demand)
-protlazy : less aggressive version of -prot that runs at finalize
and without establishing connections, so many peers
might be unconnected.
[Stdio options]
-stdio p : prefix each rank's output with [job,rank]
-stdio t : add timestamp to output
-stdio i[+|all|-|none|<rank>] : send stdin to all ranks (+), no ranks (-)
or a single specific rank
-stdio file:prefix : send output to files named <prefix>.<rank>
-stdio <option>,<option>,.. : comma separated list of the above options
[IP network selection]
-netaddr <spec>,<spec>,.. : specify what network(s) to use for
IP traffic. This option applies
to both control messages, and the
regular MPI rank traffic
-netaddr <type>:<spec>,<spec>,.. : individually specify the networks
for different types of traffic
<type> can be any of
rank : specify network for regular MPI rank-to-rank traffic
control : specify network for control messages, eg launching
mpirun : synonym for "control"
<spec> can be either
interface name : eg eth0 or ib0 etc
CIDR notation : eg 10.10.1.0/24
[libnl / libnl3 collision avoidance]
-restrict_libs nl / libnl / : only load libraries compatible with
^nl3 / ^libnl3 libnl (skip libnl3)
-restrict_libs nl3 / libnl3 / : only load libraries compatible with
^nl / ^libnl libnl3 (skip libnl)
-restrict_libs consistent : reject inconsistency vs current state
-restrict_libs none : no restrictions
-restrict_libs default : "none"
-restrict_libs v / vv / vvv : print a message when rejecting an MCA
-restrict_libs <option>,<option>,.. : comma separated list of the above
options
The levels of verboseness are
v : print a message when rejecting a library due to a libnl conflict
vv : for each library that uses libnl/nl3 print what it was detected as
vvv : for every library print what it was detected as
[Affinity options]
-aff on : turn on affinity with default option (bandwidth)
-aff off : turn off affinity (unbind)
-aff v / -aff vv : verbose
-aff bandwidth : interleave sockets, use natural hardware order
-aff latency : pack ranks across the natural hardware order
-aff cycle:<unit> : interleave binding over the specified element
-aff width:<unit> : bind each rank to an element of this size
<unit> can be hwthread, core, socket, numa, or board.
-aff default : same as "bandwidth" above
-aff auto[matic] : same as "bandwidth" above
-aff none : same as "off" above
-aff <option>,<option>,.. : comma separated list of the above options
[GPU support]
-gpu : Enable GPU awareness in PAMI.
-disable_gdr : Disable GPU Direct RDMA support for Power8 systems
[Dynamic MPI Profiling interface]
-entry <lib>,.. : list of PMPI wrapping libraries. Each <lib> can be
of the form libfoo.so, /path/to/libfoo.so, or just
foo, which will be automatically expanded into
libfoo.so for simple strings consisting only of the
characters [a-zA-Z0-9_-] (expansion not applicable
for "fort", "fortran", "v", and "vv")
-entry fort : included in a list of <lib> above, this indicates what
layer to install the base MPI product's fortran
calls which minimally wrap the C calls (by default
this is put at the top)
-entry fortran : same as fort
-entrybase <lib>,.. : optionally specify what library(s) to get the
bottom level MPI calls from, by default RTLD_NEXT
which would be the libmpi the executable is linked
against.
-baseentry : same as -entrybase
-entry v : verbose (show the layering of the MPI entrypoints)
-entry vv : more verbose - the difference is 'v' shows what levels
are intended to be used, 'vv' happens further inside
the library and confirms what libraries are being opened.
'vv' output is less readable, but more visibly confirms
interception is taking place.
-entry mpe : turn on the pre-built MPE logging library (version
mpe2-2.4.9b) from Argonne National Laboratory. The
output .clog2 file is viewable with jumpshot.
[Manual spin to wait for debugger attachment]
-dbgspin early : uses LD_PRELOAD to put processes to sleep
very early, before main(). The process
being put to sleep isn't necessarily the
MPI rank who calls MPI_Init though. For
example in "mpirun -np 2 env A=B app.x"
the first process started as a "rank" is
"env" and that would be the process put
to sleep
-dbgspin rank : puts ranks to sleep at the bottom of
MPI_Init, this way only true MPI rank
processes are put to sleep
-dbgspin barrier or nobarrier : in 'rank' mode when selected ranks sleep
at the end of MPI_Init, the other ranks
can wait in a barrier (the default) or not
with the option 'nobarrier'
-dbgspin # or #-# : specifies which ranks to sleep. Multiple
ranks and ranges can be specified, eg
-dbgspin 0,20-25,32
Defaults:
'early' mode: all ranks are slept
'rank' mode: only rank 0 is slept
-dbgspin <option>,<option>,... : comma separated list of the above options
The ranks are slept until a debugger is attached and a global spin
variable is set to 0: "set dbgspin=0".
[Spectrum MPI specific environment variables]
Spectrum MPI supports both PREPEND and POSTPEND versions of PATH,
LD_LIBRARY_PATH, and LD_PRELOAD environment variables. The value
passed will be propagated and applied on the compute node.
OMPI_PATH_PREPEND : Prepend value to PATH
OMPI_PATH_POSTPEND : Postpend value to PATH
OMPI_LD_LIBRARY_PATH_PREPEND : Prepend value to LD_LIBRARY_PATH
OMPI_LD_LIBRARY_PATH_POSTPEND : Postpend value to LD_LIBRARY_PATH
OMPI_LD_PRELOAD_PREPEND : Prepend value to LD_PRELOAD
OMPI_LD_PRELOAD_POSTPEND : Postpend value to LD_PRELOAD
[Help options]
-show : display the resulting modified mpirun command line
as well as run the resulting mpirun command
-showonly : like -show but only displays the result,
doesn't run the resulting mpirun command
-onlyshow : same as -showonly
-show_as <syntax> : specifies what syntax to output environment settings
where <syntax> can be
sh : write env settings for the sh shell (the default)
csh : write env settings for the csh shell
keyval : write env settings in a simple VAR VALUE syntax
-write_env <file> : The -write_env* options are similar to -showonly
-write_env_sh <file> : with a corresponding -show_as <syntax> option,
-write_env_csh <file> : but the generated environment is written to
-write_env_keyval <file> : <file> instead of to stdout.
-generate_env <file> : The -generate_env* options are all
-generate_env_sh <file> : equivalent to the corresponding
-generate_env_csh <file> : -write_env* options. Note that the base
-generate_env_keyval <file> : -write_env and base -generate_env use
the default <syntax> of sh
-help : display this message