P41: OpenCL-Based High-Performance 3D Stencil Computation
on FPGAs
SessionPoster Reception
Event Type
ACM Student Research Competition
Poster
Reception
TimeTuesday, November 14th5:15pm -
7pm
LocationFour Seasons Ballroom
DescriptionWith the recent advancements in OpenCL-based High-Level
Synthesis, FPGAs are now more attractive choices for
accelerating High Performance Computing workloads.
Despite their power efficiency advantage, FPGAs usually
fall short in terms of sheer performance against GPUs
due to having multiple times lower memory bandwidth and
compute performance. In this work, we show that due to
the architectural advantage of FPGAs for stencil
computation, apart from power efficiency, these devices
can also offer comparable performance to high-end GPUs.
We achieve this goal using a parameterized OpenCL-based
implementation that employs both spatial and temporal
blocking, and multiple advanced FPGA-specific
optimizations to maximize performance. We show that it
is possible to achieve up to 60 GBps and 230 GBps of
effective throughput for 3D stencil computation on Intel
Stratix V and Arria 10 FPGAs, respectively, which is
comparable to a highly-optimized implementation on
high-end GPUs.




