P80: Adaptive Loop Scheduling with Charm++ to Improve
Performance of Scientific Applications
SessionPoster Reception
Authors
Event Type
ACM Student Research Competition
Poster
Reception
TimeTuesday, November 14th5:15pm -
7pm
LocationFour Seasons Ballroom
DescriptionSupercomputers today employ a large number of cores on
each node. The Charm++ parallel programming system
provides an intelligent runtime which has been highly
effective at providing dynamic load balancing across
nodes of a supercomputer. Modern multi-core nodes
present new challenges and opportunities for Charm++.
The large degree of over-decomposition required may lead
to high overhead. We modified the Charm++ Runtime System
(RTS) to assign Charm++ objects to nodes, thus reducing
over-decomposition, and spreading work across cores via
parallel loops. We modify a library of the Charm++
software suite that supports loop parallelism by adding
to it a loop scheduling strategy that maximizes load
balance across cores while minimizing data movement. We
tune parameters of the RTS and the loop scheduling
strategy to improve performance of benchmark codes run
on a variety of architectures. Our technique improves
performance of a Particle-in-Cell code run on the Blue
Waters supercomputer by 17.2%.




