wiki:SmallProjects

Hierarchischer Baumaufbau im Treecode PEPC - Optimierung des Austausches global benötigter Daten

Der in Jülich entwickelte Code PEPC ist eine hochskalierende Implementation des klassischen Barnes-Hut Treecodes zur Lösung des N-Teilchen-Problemes. Der zu Grunde liegende Algorithmus reduziert die Anzahl der zwischen den Teilchen zu berechnenden Wechselwirkungen von O(N2) auf die Größenordnung O(NlogN). Hierzu wird eine Octtree-Gebietszerlegung benutzt, die eine Gruppierung nah beieinander liegender Teilchen in Cluster ermöglicht. Mit diesen wird kollektiv über ihre Multipolentwicklung gewechselwirkt.

In den vergangenen Monaten wurde PEPC soweit weiterentwickelt, dass es als weltweit erster Barnes-Hut Treecode die komplette Zahl der 300k Prozessoren des JUGENE effizient einsetzen kann. Der Wechselwirkungsteil des Codes skaliert dabei sogar perfekt bis zu dieser Prozessorzahl. Unerlässlich ist vor der eigentlichen Wechselwirkungsberechnung der Austausch von global benötigten Informationen aus dem Octtree. Während dies bei kleinen Prozessorzahlen vollkommen problemlos ist, zeigt sich dass die Kommunikationszeit hierfür ab ca. 128k Prozessoren einen deutlichen Beitrag zur Gesamtlaufzeit liefert.

Ziel des Projektes ist, die auszutauschende Datenmenge effizient zu reduzieren. Dazu soll statt der derzeit benutzten Jeder-Mit-Jedem Kommunikation in diesem Schritt ein hierarchisches Kommunikationsschema konzeptioniert, implementiert und dessen Tauglichkeit geprüft werden. Dies bereitet den Code auf Architekturen mit noch deutlich größeren Prozessorzahlen vor und ebnet so den Weg zum vermutlich weltweit ersten Petascale Barnes-Hut Treecode.

Project properties: gives very detailed insight into tree code and parallel communication using MPI

Proof-of-concept implemented by H. Peschke, GSP2011

Optimization of a Coulomb interaction routine for the parallel Barnes-Hut treecode PEPC

Since solving the N-body problem for long-range interactions directly is an O(N2) problem, several efficient algorithms for reducing this quadratic complexity have been developed. Among them, the Barnes-Hut tree code sticks out due to its algorithmic simplicity.

By replacing interactions with individual particles with interactions with particle groups through their multipole expansion, it reduces the algorithmic complexity to O(NlogN). While the actual number of computed interactions is reduced by this approach, the mathematical expression that has to be evaluated becomes more complex.

With the Pretty Efficient Parallel Coulomb Solver (PEPC), a highly scalable implementation of this algorithm has been developed. While it has shown to efficiently scale across all 300k processors of the IBM Blue Gene/P system Jugene, it does not make use of the PowerPC450 Dual Floating Point Unit ("DoubleHummer").

The task of this project is, to evaluate the possibility of efficiently using the dual floating point unit for calculating Coulomb interactions based on a multipole series and to implement an improved version of the interaction routine. Therefore, it will be necessary to learn the ropes of code optimization with respect to the PowerPC processor-specific features.

Project properties: no knowledge of tree code algorithm necessary, very technical, near to actual hardware

Performance improvement on maximum refinement level for the parallel Barnes-Hut treecode PEPC

Since solving the N-body problem for long-range interactions directly is an O(N2) problem, several efficient algorithms for reducing this quadratic complexity have been developed. Among them, the Barnes-Hut tree code sticks out due to its algorithmic simplicity.

By replacing interactions with individual particles with interactions with particle groups through their multipole expansion, it reduces the algorithmic complexity to O(NlogN). While the actual number of computed interactions is reduced by this approach, the mathematical expression that has to be evaluated becomes more complex. For selecting appropriately sized particle groups, within this algorithm, space is recursively subdivided and the generated cells are organized in an oct-tree data structure. After attaching the particle clusters to the tree nodes, the tree is traversed to identify particle-cluster and particle-particle interactions.

Currently, each tree leaf contains a single particle. This limits the spatial resolution of the code to approx. 10-7 of the simulations total spatial extent and leads to a possibly large amount of duplicate work when traversing the tree for nearby particles that share the same interaction list anyway.

During the project, the existing code shall be enabled to store an arbitrary number of particles in each tree leaf. The particle-cluster and particle-particle interactions will then be replaced by leaf-leaf and leaf-cluster interactions by either performing a direct sum on this refinement level or by developing a reasonable approximation that transfers the force acting onto a leaf to its constituent particles.

Project properties: gives very detailed insight into tree code, possibility for a more physics-related interpretation

Last modified 11 years ago Last modified on 04/18/13 17:54:49
Note: See TracWiki for help on using the wiki.