Searching the Sequence Read Archive using Jetstream and Wrangler

Kyle Levi, Mats Rynge, Eroma Abeysinghe, Robert A. Edwards

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)
66 Downloads (Pure)

Abstract

The Sequence Read Archive (SRA), the world's largest database of sequences, hosts approximately 10 petabases (1016 bp) of sequence data and is growing at the alarming rate of 10 TB per day. Yet this rich trove of data is inaccessible to most researchers: searching through the SRA requires large storage and computing facilities that are beyond the capacity of most laboratories. Enabling scientists to analyze existing sequence data will provide insight into ecology, medicine, and industrial applications. In this project we specifically focus on metagenomic sequences (whole community data sets from different environments). We are developing a set of tools to enable biologists to mine the metagenomes in the SRA using the NSF-funded cloud computing resources, Jetstream and Wrangler. We have developed a proof-of-principle pipeline to demonstrate the feasibility of the approach. We are leveraging our existing infrastructure to enable all scientists to access the SRA metagenomes regardless of their computational ability and are working to create a stable pipeline with a science gateway portal that is accessible to all researchers.

Original languageEnglish
Title of host publicationPEARC '18
Subtitle of host publicationProceedings of the Practice and Experience on Advanced Research Computing
Place of PublicationNew York, NY
PublisherAssociation for Computing Machinery
Pages1-7
Number of pages7
ISBN (Print)9781450364461
DOIs
Publication statusPublished - Jul 2018
Externally publishedYes
Event2018 Practice and Experience in Advanced Research Computing Conference: Seamless Creativity, PEARC 2018 - Pittsburgh, United States
Duration: 22 Jul 201826 Jul 2018

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2018 Practice and Experience in Advanced Research Computing Conference: Seamless Creativity, PEARC 2018
Country/TerritoryUnited States
CityPittsburgh
Period22/07/1826/07/18

Keywords

  • Apache Airavata
  • Bacteriophage
  • Credential Store
  • Jetstream
  • Metagenomics
  • Metagenomics Discovery Challenge
  • SciGaP
  • Search SRA
  • Sequence Read Archive
  • SRA
  • SRA Gateway
  • Wrangler

Fingerprint

Dive into the research topics of 'Searching the Sequence Read Archive using Jetstream and Wrangler'. Together they form a unique fingerprint.

Cite this