Skip to main navigation Skip to search Skip to main content

Fastq-pair: efficient synchronization of paired-end fastq files

Research output: Working paper/PreprintPreprint

38 Downloads (Pure)

Abstract


Paired end DNA sequencing provides additional information about the sequence data that is used in sequence assembly, mapping, and other downstream bioinformatics analysis. Paired end reads are usually provided as two fastq-format files, with each file representing one end of the read. Many commonly used downstream tools require that the sequence reads appear in each file in the same order, and reads that do not have a pair in the corresponding file are placed in a separate file of singletons. Although most sequencing instruments capable of generating paired end reads produce files where each read has a corresponding mate, many downstream bioinformatics manipulations break the one-to-one correspondence between reads, and paired-end sequence files loose synchronicity, and contain either unordered sequences or sequences in one or other file without a mate. Trivial solutions to this problem require reading one or both of the DNA sequence files into memory but quickly become limited by computational resources for moderate to large sized sequence files that are common nowadays. Here, we introduce a fast and memory efficient solution, written in C for portability, that synchronizes paired-end fastq files for subsequent analysis and places unmatched reads into singleton files.
Fastq-pair is freely available from https://github.com/linsalrob/fastq-pair and is released under the MIT license.
Original languageEnglish
PublisherbioRxiv, Cold Spring Harbor Laboratory
Number of pages6
DOIs
Publication statusPublished - 19 Feb 2019
Externally publishedYes

Keywords

  • fastq
  • next generation sequencing

Fingerprint

Dive into the research topics of 'Fastq-pair: efficient synchronization of paired-end fastq files'. Together they form a unique fingerprint.

Cite this