A bioinformatic pipeline for simulating viral integration data

Suzanne Scott, Susanna Grigson, Felix Hartkopf, Claus Hallwirth, Ian Alexander, Denis Bauer, Laurence Wilson

Research output: Contribution to journalArticlepeer-review

58 Downloads (Pure)

Abstract

Viral integration is a complex biological process, and it is useful to have a reference integration dataset with known properties to compare experimental data against, or for com- paring with the results from computational tools that detect integration. To generate these data, we developed a pipeline for simulating integrations of a viral or vector genome into a host genome. Our method reproduces more complex charac- teristics of vector and viral integration, including integration of sub-genomic fragments, structural variation of the inte- grated genomes, and deletions from the host genome at the integration site. Our method [1] takes the form of a snake- make [2] pipeline, consisting of a Python [3] script using the Biopython [4] module that simulates integrations of a viral reference into a host reference. This produces a refer- ence containing integrations, from which sequencing reads are simulated using ART [5] . The IDs of the reads crossing in- tegration junctions are then annotated using another python script to produce the final output, consisting of the simulated reads and a table of the locations of those integrations and the reads crossing each integration junction. To illustrate our method, we provide simulated reads, integration locations, as well as the code required to simulate integrations using any virus and host reference. This simulation method was used to investigate the performance of viral integration tools in our research [6] .
Original languageEnglish
Article number108161
Number of pages7
JournalData in Brief
Volume42
DOIs
Publication statusPublished - Jun 2022

Keywords

  • Gene therapy
  • In silico
  • Integration
  • Vector
  • Virus

Fingerprint

Dive into the research topics of 'A bioinformatic pipeline for simulating viral integration data'. Together they form a unique fingerprint.

Cite this