wiki:WikiStart

Version 24 (modified by Florian Janetzko, 13 years ago) ( diff )

--

Jumel (JUROPA memory logger)

General information

The JUROPA memory logger is intended for monitoring the memory usage of applications on NUMA architectures (especially JUROPA). It consists currently of two Python scripts:

  • jumel (JUROPA memory logger), the actual logger
  • juman (JUROPA memory analyzer), a postprocessing tool

Concept

The logger is started with mpiexec and subsequently starts the application to monitor. It will create a directory (default: .memlog in the PBS_O_WORKDIR, => wrkdir, option -w) and by default each task will create its own logfile in that directory. The logger checks now each time step (=> delay, option -d) for the following keys in the status file of each process (/proc/<id>/status):

  • VmSize
  • VmData
  • VmStk
  • VmRSS

Each task writes the value for each key in the file .memlog/task<MPI-rank>.log and waits for the next time step.

After the run is finished the analyzer juman is run in the PBS_O_WORKDIR, either from within the same job script or afterwards on the login node to analyze the consumed resources. It will create graphs with the value of key (default: VmSize, => key, option -k) for each task at each time step (use juman -k help for a list of available keys), the process with the maximum value of the key at each time step and the total sum of all values of a key across all tasks at each time step (=> statistics, option -s). If -i is specified the graphs are displayed immediately.

Usage

In order to get an overview of the valid options please use

`jumel -u`
`juman -u`

Defaults:

  • work directory: .memlog
  • delay: 10 seconds
  • all ranks log the memory consumption
  • resources logged: /proc/<id>/status: VmSize, VmRSS, VmData, VmStk

Suppose the application to monitor is started usually as follows:

mpiexec -np 32 -e APP_ROOT app.x -i app.inp

To start the application with jumel use

mpiexec -np 32 -e PBS_ID,APP_ROOT jumel -a "app.x -i app.inp"

The variable PBS_ID needs not to be specified, how ever it will be displayed in the jumel logfiles and eases the tracking of the runs afterwards (e.g. when looking for the job in the system logfiles). Once the run is finished run juman in the same directory:

juman -s all -i

The -s option switches on the statistics (currently: total consumption and maximum consumption per time step) and the -i option will start the graphical display of the results. Currently only a gnuplot interface is implemented and postscript of xfig files can be generated. An interface for visualization with Python is planned.

Example: Namd

The following job script was used to monitor running the apoa1 benchmark with Namd:

#!/bin/bash 
#MSUB -l nodes=4:ppn=8
#MSUB -l walltime=00:15:00
#MSUB -v tpt=1 

module load namd/2.7

mpiexec -np 32 -e PBS_JOBID jumel -a "$NAMD_ROOT/bin/namd2 apoa1.namd"

The graphs were obtained afterwards on the login node with

juman -s all -i

Results

Below the results for the Namd runs are shown for VmSize (default key). The values reported are in kB. source:/examples/Namd/VmSize.png source:/examples/Namd/VmSize_max.png source:/examples/Namd/VmSize_total.png

SVN Access

svn list https://svn.version.fz-juelich.de/jumel
Note: See TracWiki for help on using the wiki.