Abstract
Genomes of organisms are constructed by assembling sequence reads from whole genome sequencing. It is useful to determine sequence read-coverage of genome assemblies, for instance identifying duplication or deletion events, identifying related contigs for binning metagenomes, or analysing taxonomic compositions of metagenomes. Although calculating read-coverage is a routine task, it typically involves several complete read and write operations (I/O operations). This is not a problem for small datasets, but can be a significant bottleneck for very large datasets. Koverage reduces I/O burden as much as possible to enable maximum scalability. Koverage includes a kmer-based method that significantly reduces the computational complexity for very large reference genomes. Koverage uses Snakemake, providing out-of-the-box support for HPC and cloud environments. It utilises the Snaketool command line interface, and is installable with PIP or Conda for maximum ease of use. Source code and documentation are available at https://github.com/beardymcjohnface/Koverage.
Original language | English |
---|---|
Article number | 6235 |
Number of pages | 6 |
Journal | Journal of Open Source Software |
Volume | 9 |
Issue number | 94 |
DOIs | |
Publication status | Published - 27 Feb 2024 |
Keywords
- Datasets
- Read-coverage analysis
- Genome assemblies
- Snakemake