TY - JOUR
T1 - Programming the Adapteva Epiphany 64-core network-on-chip coprocessor
AU - Varghese, Anish
AU - Edwards, Bob
AU - Mitra, Gaurav
AU - Rendell, Alistair
PY - 2017/7/1
Y1 - 2017/7/1
N2 - Energy efficiency is the primary impediment in the path to exascale computing. Consequently, the high-performance computing community is increasingly interested in low-power high-performance embedded systems as building blocks for large-scale high-performance systems. The Adapteva Epiphany architecture integrates low-power RISC cores on a 2D mesh network and promises up to 70 GFLOPS/Watt of theoretical performance. However, with just 32 KB of memory per eCore for storing both data and code, programming the Epiphany system presents significant challenges. In this paper we evaluate the performance of a 64-core Epiphany system with a variety of basic compute and communication micro-benchmarks. Further, we implemented two well known application kernels, 5-point star-shaped heat stencil with a peak performance of 65.2 GFLOPS and matrix multiplication with 65.3 GFLOPS in single precision across 64 Epiphany cores. We discuss strategies for implementing high-performance computing application kernels on such memory constrained low-power devices and compare the Epiphany with competing low-power systems. With future Epiphany revisions expected to house thousands of cores on a single chip, understanding the merits of such an architecture is of prime importance to the exascale initiative.
AB - Energy efficiency is the primary impediment in the path to exascale computing. Consequently, the high-performance computing community is increasingly interested in low-power high-performance embedded systems as building blocks for large-scale high-performance systems. The Adapteva Epiphany architecture integrates low-power RISC cores on a 2D mesh network and promises up to 70 GFLOPS/Watt of theoretical performance. However, with just 32 KB of memory per eCore for storing both data and code, programming the Epiphany system presents significant challenges. In this paper we evaluate the performance of a 64-core Epiphany system with a variety of basic compute and communication micro-benchmarks. Further, we implemented two well known application kernels, 5-point star-shaped heat stencil with a peak performance of 65.2 GFLOPS and matrix multiplication with 65.3 GFLOPS in single precision across 64 Epiphany cores. We discuss strategies for implementing high-performance computing application kernels on such memory constrained low-power devices and compare the Epiphany with competing low-power systems. With future Epiphany revisions expected to house thousands of cores on a single chip, understanding the merits of such an architecture is of prime importance to the exascale initiative.
KW - Epiphany
KW - matrix-matrix multiplication
KW - Network-on-chip
KW - parallella
KW - stencil
KW - Stencil
KW - Parallella
UR - http://www.scopus.com/inward/record.url?scp=85021261264&partnerID=8YFLogxK
UR - http://purl.org/au-research/grants/ARC/DP0987773
UR - http://www.scopus.com/inward/record.url?scp=84918774260&partnerID=8YFLogxK
U2 - 10.1177/1094342015599238
DO - 10.1177/1094342015599238
M3 - Article
AN - SCOPUS:84918774260
SN - 1094-3420
VL - 31
SP - 285
EP - 302
JO - International Journal of High Performance Computing Applications
JF - International Journal of High Performance Computing Applications
IS - 4
ER -