Koverage: Read-coverage analysis for massive (meta)genomics datasets

Michael J Roach, Bradley J Hart, Sarah J Beecroft, Bhavya Papudeshi, Laura K Inglis, Susanna R Grigson, Vijini Mallawaarachchi, George Bouras, Robert A Edwards

Research output: Contribution to journalArticlepeer-review

23 Downloads (Pure)

Abstract

Genomes of organisms are constructed by assembling sequence reads from whole genome sequencing. It is useful to determine sequence read-coverage of genome assemblies, for instance identifying duplication or deletion events, identifying related contigs for binning metagenomes, or analysing taxonomic compositions of metagenomes. Although calculating read-coverage is a routine task, it typically involves several complete read and write operations (I/O operations). This is not a problem for small datasets, but can be a significant bottleneck for very large datasets. Koverage reduces I/O burden as much as possible to enable maximum scalability. Koverage includes a kmer-based method that significantly reduces the computational complexity for very large reference genomes. Koverage uses Snakemake, providing out-of-the-box support for HPC and cloud environments. It utilises the Snaketool command line interface, and is installable with PIP or Conda for maximum ease of use. Source code and documentation are available at https://github.com/beardymcjohnface/Koverage.
Original languageEnglish
Article number6235
Number of pages6
JournalJournal of Open Source Software
Volume9
Issue number94
DOIs
Publication statusPublished - 27 Feb 2024

Keywords

  • Datasets
  • Read-coverage analysis
  • Genome assemblies
  • Snakemake

Fingerprint

Dive into the research topics of 'Koverage: Read-coverage analysis for massive (meta)genomics datasets'. Together they form a unique fingerprint.

Cite this