Presentation

· Presenters · Organizations · Search Program

Paper

: Low Communication FMM-Accelerated FFT on GPUs

SessionFast Multipole Methods and Linear Algebra

Author

Cris Cecka

Event Type

Paper

Tags

TimeThursday, November 16th2pm - 2:30pm

Location405-406-407

DescriptionCommunication-avoiding algorithms have been the subject of growing interest in the last decade due to the growth of distributed memory systems and the disproportionate increase of computational throughput to communication bandwidth. For distributed 1D FFTs, communication costs quickly dominate execution time as all industry-standard implementations perform three all-to-all transpositions of the data.

In this work, we reformulate an existing algorithm that employs the Fast Multipole Method to reduce the communication requirements to approximately a single all-to-all transpose. We present a detailed and clear implementation strategy that relies on existing library primitives, demonstrate that this implementation achieves consistent speed-ups between 1.3x and 2.2x against cuFFTXT on 2xP100 and 8xP100 GPUs, and develop an accurate compute model to analyze the performance dependencies.

Download PDF: here

Author

Cris Cecka

Nvidia Corporation

Navigation