Redesigning CAM-SE for Peta-Flops Performance on Sunway
TaihuLight
Author/Presenter
Event Type
Workshop
Applications
Government Strategies, Programs, and Funding
HPC Center Planning and Operations
TimeMonday, November 13th9:30am -
9:45am
Location708
DescriptionWith radical architectural changes in both the
computing architecture and the memory hierarchy for
recent leadership supercomputers, it is becoming more
and more difficult for well-established numerical codes,
such as the millions lines of code in the climate
domain, to gain performance benefits. In this talk, we
will report our efforts on achieving an efficient
utilization of the Sunway TaihuLight for climate-kind
applications, such as CAM-SE. We refactored and
optimized the complete code using OpenACC directives at
the first stage. A more aggressive and finer-grained
redesign is then applied on the CAM, to achieve finer
memory control and usage, more efficient vectorization
and compute and communication overlapping. We further
improve the CAM performance of a 260-core Sunway
processor to the range of 28 to 184 Intel CPU cores, and
achieve a sustainable double-precision performance of
3.3 PFlops for a 750 m global simulation when using
10,075,000 cores. CAM on Sunway achieves the simulation
speed of 3.4 and 21.5 simulation-year-per-day (SYPD) for
global 25-km and 100-km resolution respectively; and
enables us to perform, to our knowledge, the first
simulation of the complete lifecycle of hurricane
Katrina, and achieve close-to-observation simulation
results for both track and intensity.
Author/Presenter




