Context Navigation

Benchmarks on Helios showed strange behaviour. Some infos and parameters:

Machine:

Each node consists of 16 cores with 58 GB available memory.
The total system is composed of 4410 nodes with the peak performance of 1.52 PF and the available memory of 256 TB.
The Interconnection network is Infiniband IB QDR, non-blocking.
Topology is fat-tree and the connection socket is PCIExpress gen3, bi-directional.

Jobscript

relevant parts of jobscript (example for bold job in tables):
#SBATCH -N 16 # number of nodes
#SBATCH -n 32 # number of tasks
#SBATCH -c 8 # number of cores per task
...
NP=${SLURM_NTASKS} # the total number of tasks
...
export OMP_NUM_THREADS=8 # number of threads
export KMP_AFFINITY=compact # environment variable of OpenMP runtime
export KMP_STACKSIZE=1G # openmp stack size
...
mpirun -np ${NP} ./${BIN}.local params

tests with pepc-mini: runtimes in s, examples/mini-tube, filtering off, all output and diagnostics turned off

compiler: intel/13.0.079 mpi: bullxmpi/1.1.16.2 (also tested with intelmpi/4.1 and intelmpi/4.0.3)

320,000 particles

nodes:	2	4	8	16	32
num_walk_threads=2	518	266	156	103	-
num_walk_threads=4	266	188	158	128	-
num_walk_threads=6	186	180	184	158	-
num_walk_threads=7	157	172	180	159	-

640,000 particles

nodes:	2	4	8	16	32
num_walk_threads=2	~1660	~700	350	201	126
num_walk_threads=4	~650	396	227	191	151
num_walk_threads=6	456	358	251	241	158
num_walk_threads=7	389	374	265	247	166

1,280,000 particles

nodes:	2	4	8	16	32
num_walk_threads=2	-	~2000	~1070	490	246
num_walk_threads=4	-	~970	530	318	215
num_walk_threads=6	~1850	~640	460	300	237
num_walk_threads=7	~1580	579	414	313	250

tests with pepc-f and pepc-mini, num_walk_threads=7, 2 ranks per node

1,280,000 particles

nodes:	2	4	8	16	32	64	128
pepc-mini (compare to runs with 1280000 part. above)	-	588	407	315	246	155	125
pepc-f, no wall, no periodic bc	-	~3300	288	54	39	61	69
pepc-f, no wall, periodic bc, 2 mirror layers	-	~27500	~681	266	126	98	97
pepc-f, wall, no periodic bc	585	467	149	40	37	62	67
pepc-f, wall, periodic bc, 2 mirror layers	~2500	~1500	569	144	117	95	97
pepc-f, wall, periodic bc, 2 mirror layers, hpcff	-	~1600	625	106	65	56	-

This showed (at least for nodes<64) reasonable results

50 steps of a typical pepc-f production run (2.250.000 particles) showed the following:

nodes:	2	4	8	16	32	64	128
helios	-	620	392	~2427	148	122	95
hpcff	-	722	419	~2504	100	75	-

Further tests showed, that the problem shows up for the chosen particle configuration and 32 mpi ranks on both machines. See plot attached.

tests with pepc-f, runtimes in s, 20 timesteps

compiler: intel/13.0.079 mpi: intelmpi/4.1

1 MPI task per node, OMP_NUM_THREADS=16, num_walk_threads=15

nodes:	1	2	4	8
10000 part.	6	-	-	-
20000 part.	7	-	-	-
40000 part.	12	-	-	-
80000 part.	21	-	-	-
160000 part.	42	-	-	-
320000 part.	84	49	27	17
640000 part.	171	107	336	359
1280000 part.	375	353	~6000	~5000

2 MPI task per node, OMP_NUM_THREADS=8, num_walk_threads=7

nodes:	1	2	4	8
10000 part.	4	-	-	-
20000 part.	6	-	-	-
40000 part.	11	-	-	-
80000 part.	19	-	-	-
160000 part.	36	-	-	-
320000 part.	77	41	25	17
640000 part.	178	~740	~920	134
1280000 part.	~600	>12000	~11000	414

Last modified 12 years ago Last modified on 10/23/12 16:21:25

Attachments (6)

csc-ITinstalled.jpg (87.4 KB ) - added by Christian Salmagne 12 years ago. Helios hardware
runtime_per_timestep.png (216.0 KB ) - added by Christian Salmagne 12 years ago. runtime per timestep for different number of nodes on helios and hpcff. Typical production case with pepc-f
runtime_per_timestep_zoom.png (384.8 KB ) - added by Christian Salmagne 12 years ago.
total_runtime.png (88.2 KB ) - added by Christian Salmagne 12 years ago. strong scaling for hpcff and helios
mpi32_problem.png (93.1 KB ) - added by Christian Salmagne 12 years ago. With 32 mpi ranks, runtime goes up after 16 steps wtih my production starting configuration
total_runtime_1280000.png (91.4 KB ) - added by Christian Salmagne 12 years ago. total runtime of the pepc-f runs with wall, periodic bc and 1280000 particles

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text