wiki:bgas-user:bgas-manpages:jb_execv

jb_execv

Name

jb_execv - execute program on BGAS IO node

Synopsis

#include <mpi.h>
#include "jbcnl.h"

int jb_execv( const char* filename, char* const argv[]);

Include with -I/bgsys/local/bgas/jbrt/jbcn/include, link with -L/bgsys/local/bgas/jbrt/jbcn/lib -ljbcn.

Description

jb_execv() is the JBRT analogue of a call of the following function:

pid_t fork_execv( const char* filename, char* const argv[]) {
  pid_t pid = fork();

  if (!pid) {
    execv( filename, argv);

    exit( -1);
  }
  else {
    return pid;
  }
}

The main difference is that while this hypothetical function creates a process and executes code on the node the caller process lives on, jb_execv() is called by a process living on a BG/Q compute node, but creates a process and executes code on the IO node this compute node is connected to.

That is, jb_execv() triggers execution of the program pointed to by "filename". "filename" must be either a binary executable, or a script starting with a line of the form

#! interpreter [optional-arg]

For details of the latter case, see the "Interpreter scripts" section of the execve() manpage.

"argv" is an array of null-terminated strings representing the argument list available to the new program. However, for CN-ION message size constraint reasons, jb_execv() adds the "filename" argument to the actual argument list of the program to be executed. That is, assuming "filename" is a binary and its main routine reads

int main( int ion_argc, char** ion_argv),

"ion_argv[0]" is a duplicate of "filename" (i.e. equals "filename" in the strcmp sense of the word), while "ion_argv[1]" duplicates "argv[0]", "ion_argv[2]" duplicates "argv[1]" etc. In other words, users should follow the "argv[0]=filename" convention when writing user code for the IO node side, but should explicitly NOT follow it when calling jb_execv() on the compute node side. As with execv(), the array of pointers must be terminated by a NULL pointer; in particular, do not specify "argv" to be NULL, as this will segfault (this is in accordance with execv() behaviour).

Note that, unlike for standard execv(), the IO node process created by jb_execv() does NOT inherit the environment of the calling compute node process. Instead, the environment of the new (IO node) process is empty. (Inheritance of the compute node environment would not make sense since the "child" process is not even running under the same OS as the "parent" process.)

No process attributes save for user ID, effective user ID, group ID and effective group ID are preserved in the newly created IO node process. jb_execv() follows the execve() rules for effective user ID and effective group ID.

Constraints and limitations

Limits on size of arguments

Due to constraints on the size of the CN-ION messages used by the JBRT, the limit on the total size of the command-line argument strings to be passed to the new IO node program is much stricter than the corresponding limit for execve(). The total size of "filename" and the "argv" element strings, when written as a long, blank-delimited, zero-terminated string, may not exceed the JBRT_EXEC_CMD_SIZE constant defined in jbsd_messages.h, which is currently set to 150 bytes.

Limits on number of spawned ION processes

There are two limits on the number of IO node processes spawned via jb_execv(), one on a per-compute-node-process basis and one on a per-IO-node basis. Both limits are unlikely to be reached in practice.

On a per-compute-node-process basis, each spawned IO node process gets assigned a so-called "tag", which is used in place of the child process id for monitoring and reaping. Tags are integers from 1 to 8, thus each compute node process can have 8 IO node "children" at any given time. If no tag is available, jb_execv() will fail.

On a per-IO-node basis, one has to take care of the limits to numbers of both threads and open files imposed by the Linux installation on the IO nodes. The IO-node-wise limit to the number of IO node user processes is currently slightly below 256, and thus significantly below the theoretical maximum of 32768 IO node user code instances allowed for by the CN-side limitations. Violations of the IO node limitations cannot be accounted for by the CN side and are thus treated analogously to a failure of the execv() part of the hypothetical "fork_execv" function showed above (as opposed to a failure of the fork() part); that is, they do not cause jb_execv() itself to fail, but can be detected a posteriori by monitoring of the issued tag.

Return value

On success, an integer between 1 and 8 is returned; this integer is to be treated like the return value of fork(), i.e. to be stored for usage as a function argument for monitoring and reaping of IO node processes; see jb_execv_status.

On failure, -1 is returned, and "errno" is set appropriately. Note that jb_execv() can "fail silently" in that it might return successful though no actual ION process has been created. In this case, the JBRT behaves as if an ION process had been successfully created and then got killed by SIGHUP.

Errors

E2BIG

The total number of bytes in "filename" and the argument list ("argv") is too large.

EAGAIN

jb_execv() ran into the 8-process limit for processes spawned by this compute node process; wait for termination of an earlier IO node tasks and "reap" it (see jb_execv_status()) before retrying.

ENAMETOOLONG

"filename" is too long.

Other errors pertain to ZeroMQ errors on the ION side and are proof of bugs in the JBRT. If an error different from E2BIG, EAGAIN and ENAMETOOLONG occurs, please report to n.vandenbergen@….

THREAD-SAFETY

jb_execv() may be called safely from inside a (POSIX or OpenMP) threaded region. Note that its return value should not be stored in a variable shared among threads, since it is needed later for freeing resources allocated to the remote job (see jb_execv_status()). Also, the space of available tags is shared among threads of the same process, i.e. the 8-tag limit is per MPI task, not per thread.

Last modified 10 years ago Last modified on 09/30/14 20:53:04
Note: See TracWiki for help on using the wiki.