Encoding Unitig-level Assembly Graphs with Heterophilous Constraints for Metagenomic Contigs Binning

Hansheng Xue, Vijini Mallawaarachchi, Lexing Xie, Vaibhav Rajan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Metagenomics studies genomic material derived from mixed microbial communities in diverse environments, holding considerable significance for both human health and environmental sustainability. Metagenomic binning refers to the clustering of genomic subsequences obtained from high-throughput DNA sequencing into distinct bins, each representing a constituent organism within the community. Mainstream binning methods primarily rely on sequence features such as composition and abundance, making them unable to effectively handle sequences shorter than 1,000 bp and inherent noise within sequences. Several binning tools have emerged, aiming to enhance binning outcomes by using the assembly graph generated by assemblers, which encodes valuable overlapping information among genomic sequences. However, existing assembly graph-based binners mainly focus on simplified contig-level assembly graphs that are recreated from assembler's original graphs, unitig-level assembly graphs. The simplification reduces the resolution of the connectivity information in original graphs. In this paper, we design a novel binning tool named UNITIGBIN, which leverages representation learning on unitig-level assembly graphs while adhering to heterophilous constraints imposed by single-copy marker genes, ensuring that constrained contigs cannot be grouped together. Extensive experiments conducted on synthetic and real datasets demonstrate that UNITIGBIN significantly surpasses state-of-the-art binning tools.

Original languageEnglish
Title of host publicationThe Twelfth International Conference on Learning Representations
Number of pages22
Publication statusPublished - 16 Jan 2024
EventICLR 2024 - The Twelfth International Conference on Learning Representations - Messe Wien Exhibition and Congress Center, Vienna, Austria
Duration: 7 May 202411 May 2024
Conference number: 12
https://iclr.cc/

Conference

ConferenceICLR 2024 - The Twelfth International Conference on Learning Representations
Country/TerritoryAustria
CityVienna
Period7/05/2411/05/24
Internet address

Keywords

  • Metagenomics
  • Metagenomic binning
  • DNA sequencing

Fingerprint

Dive into the research topics of 'Encoding Unitig-level Assembly Graphs with Heterophilous Constraints for Metagenomic Contigs Binning'. Together they form a unique fingerprint.

Cite this