PlasLR Enables Adaptation of Plasmid Prediction for Error-Prone Long Reads

Anuradha Wickramarachchi, Vijini Mallawaarachchi, Lianrong Pu, Yu Lin

Research output: Working paper/PreprintPreprint

34 Downloads (Pure)

Abstract

Plasmids are extra-chromosomal genetic elements commonly found in bacterial cells that support many functional aspects including environmental adaptations. The identification of these genetic elements is vital for the further study of function and behaviour of the organisms. However it is challenging to separate these small sequences from longer chromosomes within a given species. Machine learning approaches have been successfully developed to classify assembled contigs into two classes (plasmids and chromosomes). However, such tools are not designed to directly perform classification on long and error-prone reads which have been gaining popularity in genomics studies. Assembling complete plasmids is still challenging for many long-read assemblers with a mixed input of long and error-prone reads from plasmids and chromosomes. In this paper, we present PlasLR, a tool that adapts existing plasmid detection approaches to directly classify long and error-prone reads. PlasLR makes use of both the composition and coverage information of long and error-prone reads. We evaluate PlasLR on multiple simulated and real long-read datasets with varying compositions of plasmids and chromosomes. Our experiments demonstrate that PlasLR substantially improves the accuracy of plasmid detection on top of the state-of-the-art plasmid detection tools. Moreover, we show that using PlasLR before long-read assembly helps to enhance the assembly quality in terms of plasmid recovery and near complete chromosome assembly from metagenomic datasets.
Original languageEnglish
PublisherbioRxiv, Cold Spring Harbor Laboratory
Publication statusSubmitted - 14 Jun 2021
Externally publishedYes

Keywords

  • plasmid
  • PlasLR
  • chromosomes
  • error-prone reads
  • metagenomics

Fingerprint

Dive into the research topics of 'PlasLR Enables Adaptation of Plasmid Prediction for Error-Prone Long Reads'. Together they form a unique fingerprint.

Cite this