wiki:WikiStart

JUMEL (JUROPA MEmory Logger)

General information

The JUROPA memory logger is intended for monitoring the memory usage of applications on NUMA architectures (especially JUROPA). It consists currently of two Python scripts:

  • jumel (JUROPA Memory Logger), the actual logger
  • juman (JUROPA Memory ANalyzer), a postprocessing tool

Concept

The logger is started with mpiexec and subsequently starts the application to monitor. The monitoring is done per task and/or per node. In time steps information are gathered from files provided by the operating system or from commands that are issued by the logger itself. Currently, the following resources are used/monitored:

  • Monitoring by task:
    • file /proc/<PID>/status with keys
      • VmExe (Memory of task marked as executable in kB)
      • VmSt (Stack memory of task in kB)
      • VmData (Heap memory of task in kB)
      • VmSize (Total memory consumption of task in kB)
      • VmLck (Memory locked by the kernel in kB)
      • VmLib (Memory used as shared memory in kB)
      • VmRSS (Resident Set Size of the task in kB)

  • Monitoring per node
    • command vmstat with keys
      • MFree (Free memory of the node in kB)
      • TWait (Number of waiting tasks on the node)
      • Idle (CPU idling in %)
      • TDead (Number of dead tasks on the node)
      • UsedUs (CPU used by user processes in %)
      • UsedKe (CPU used by Kernel in %)

When monitoring per task each task writes the value for each key to the file .memlog/task<MPI-rank>.log and waits for the next time step. When monitoring by node the process running on core 0 of each node writes the value for each key to the file node<node-name>_task<MPI-task>.log. Both (monitoring by task and monitoring by node) can be active at the same time.

The data produced by jumel can be analyzed by the juman script. It performs statistical analysis of the data (minimum, maximum and total sum of values) and generates corresponding graphs.

Usage

In order to get an overview of the valid options please use

jumel -u
juman -u

Suppose the application to monitor is started usually as follows:

mpiexec -np 32 -e APP_ROOT app.x -i app.inp > my.out

To start the application with jumel use

mpiexec -np 32 -e PBS_ID,APP_ROOT jumel -n -a "app.x -i app.inp" > my.out

The variable PBS_ID needs not to be specified, however it will be displayed in the jumel logfiles and eases the tracking of the runs afterwards (e.g. when looking for the job in the system logfiles). Once the run is finished run juman in the same directory:

juman -s all -n -i

The -s option switches on the statistics and the -i option will start the graphical display of the results. Currently only a gnuplot interface is implemented and postscript or xfig files can be generated. An interface for visualization with Python is planned.

Example: Namd

The following job script was used to monitor running the apoa1 benchmark with Namd:

#!/bin/bash 
#MSUB -l nodes=4:ppn=8
#MSUB -l walltime=00:15:00
#MSUB -v tpt=1 

module load namd/2.7

mpiexec -np 32 -e PBS_JOBID jumel -n -p -t -a "$NAMD_ROOT/bin/namd2 apoa1.namd"

The graphs were obtained afterwards on the login node with

juman -s all -n -t -i

Results

Below the results for the Namd runs are shown for VmSize (default key). The values reported are in kB.
source:/examples/Namd/task-statistics-VmSize.png source:/examples/Namd/task-statistics-minmax-VmSize.png source:/examples/Namd/task-statistics-total-VmSize.png source:/examples/Namd/node-statistics-MFree.png source:/examples/Namd/node-statistics-minmax-MFree.png source:/examples/Namd/node-statistics-total-MFree.png

Example Mapt

#!/bin/bash 
#MSUB -l nodes=2:ppn=8
#MSUB -l walltime=00:05:00
#MSUB -v tpt=1 


mpiexec -np 16 jumel -d 2 -n -p -t -a "mapt.x" > mapt.out

The graphs were obtained afterwards on the login node with

juman -s all -i -n -t

Results

Below the results for the Mapt runs are shown for VmSize (default key). The values reported are in kB.

source:/examples/Mapt/task-statistics-VmSize.png source:/examples/Mapt/task-statistics-minmax-VmSize.png source:/examples/Mapt/task-statistics-total-VmSize.png source:/examples/Mapt/node-statistics-MFree.png source:/examples/Mapt/node-statistics-minmax-MFree.png source:/examples/Mapt/node-statistics-total-MFree.png

SVN Access

svn list https://svn.version.fz-juelich.de/jumel
Last modified 12 years ago Last modified on 01/25/12 13:35:44
Note: See TracWiki for help on using the wiki.