Use of multiple GPUs on shared memory multiprocessors for ultrasound propagation simulations

Jiri Jaros, Bradley E Treeby, Alistair P. Rendell

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Citations (Scopus)

Abstract

This paper outlines our effort to migrate a compute intensive application of ultrasound propagation being developed in Matlab to a cluster computer where each node has seven GPUs. Our goal is to perform realistic simulations in hours and minutes instead of weeks and days. In order to reach this goal we investigate architecture characteristics of the target system focusing on the PCI-Express subsystem and new features proposed in CUDA version 4.0, especially simultaneous host to device, device to host and peer-to-peer transfers that the application is going to highly benefit from. We also present the results from a CPU based implementation and discuss future directions to exploit multiple GPUs.

Original languageEnglish
Title of host publicationParallel and Distributed Computing 2012
Subtitle of host publicationProceedings of the Tenth Australasian Symposium on Parallel and Distributed Computing, AusPDC 2012
Pages43-52
Number of pages10
Publication statusPublished - 20 Nov 2012
Externally publishedYes
Event10th Australasian Symposium on Parallel and Distributed Computing, AusPDC 2012 - Melbourne, VIC, Australia
Duration: 31 Jan 20123 Feb 2012

Publication series

NameConferences in Research and Practice in Information Technology Series
Volume127
ISSN (Print)1445-1336

Conference

Conference10th Australasian Symposium on Parallel and Distributed Computing, AusPDC 2012
Country/TerritoryAustralia
CityMelbourne, VIC
Period31/01/123/02/12

Keywords

  • 7-GPU system
  • Bandwidth
  • CUDA
  • FFT
  • Matlab
  • Multi-core
  • PCI-Express
  • Ultrasound simulation

Fingerprint

Dive into the research topics of 'Use of multiple GPUs on shared memory multiprocessors for ultrasound propagation simulations'. Together they form a unique fingerprint.

Cite this