Simulating realistic short tandem repeat capillary electrophoretic signal using a generative adversarial network

Duncan Alexander Taylor, Melissa Humphries

Research output: Contribution to journalArticlepeer-review

9 Downloads (Pure)

Abstract

DNA profiles are made up from multiple series (relating to different fluorophores, referred to as ‘dyes’) of electrophoretic signal measuring fluorescence over time. Typically, human DNA analysts ‘read’ DNA profiles using their experience to distinguish instrument noise, artefactual signal, and signal corresponding to DNA fragments of interest. Recent work has developed an artificial neural network (ANN) to carry out the task of classifying fluorescence types into categories in DNA profile electrophoretic signal. But the creation of the necessarily large amount of labelled training data for the ANN is time consuming and expensive, and a limiting factor in the ability to robustly train the ANN. If realistic, pre-labelled, and biologically informed training data could be simulated then this would remove the barrier to training an ANN with high efficacy. Here we develop a generative adversarial network (GAN), modified from the pix2pix GAN to achieve this task. With 1078 DNA profiles we train the GAN and achieve the ability to simulate DNA profile information, and then use the generator from the GAN as a ‘realism filter’ that applies the noise and artefact elements exhibited in typical electrophoretic signal. The GAN utilises a custom generator architecture, based on a U-Net configuration, but with two ‘U’ paths, one that models across-dye features and one that models within-dye features. Convergence was achieved after 150 epochs. Frechet Inception Distance showed that the generator was able to increase the realism of an idealised (noiseless) mock-electropherogram with real profile to real profile comparisons yielding a distance of 4.0, real to idealised yielding a value of 5.3 and real to generated profiles yielding a value of 4.7. The realism of the generated profiles was confirmed by a DNA profile expert. The ability to generate realistic DNA profiles provides the ability to simulate an unlimited amount of training data that possesses specific features of interest. This overcomes the limiting issue of expense associated with laboratory-created profiles.

Original languageEnglish
Article number127536
Number of pages14
JournalExpert Systems with Applications
Volume280
DOIs
Publication statusPublished - 25 Jun 2025

Keywords

  • Biologically informed AI
  • DNA profile simulation
  • Electropherogram
  • Generative adversarial network
  • Pix2pix

Fingerprint

Dive into the research topics of 'Simulating realistic short tandem repeat capillary electrophoretic signal using a generative adversarial network'. Together they form a unique fingerprint.

Cite this