Protein structure-informed bacteriophage genome annotation with Phold

George Bouras, Susanna R. Grigson, Milot Mirdita, Michael Heinzinger, Bhavya Papudeshi, Vijini Mallawaarachchi, Renee Green, Rachel Seongeun Kim, Victor Mihalia, Alkis James Psaltis, Peter John Wormald, Sarah Vreugde, Martin Steinegger, Robert A. Edwards

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)
1 Downloads (Pure)

Abstract

Bacteriophage (phage) genome annotation is essential for understanding their functional potential and suitability for use as therapeutic agents. Here, we introduce Phold, an annotation framework utilizing protein structural information that combines the ProstT5 protein language model and structural alignment tool Foldseek. Phold assigns annotations using a database of over 1.36 million predicted phage protein structures with high-quality functional labels. Benchmarking reveals that Phold outperforms existing sequence-based homology approaches in functional annotation sensitivity whilst maintaining speed, consistency, and scalability. Applying Phold to diverse cultured and metagenomic phage genomes shows it consistently annotates over 50% of genes on an average phage and 40% on an average archaeal virus. Comparisons of phage protein structures to other protein structures across the tree of life reveal that phage proteins commonly have structural homology to proteins shared across the tree of life, particularly those that have nucleic acid metabolism and enzymatic functions. Phold is available as free and open-source software at https://github.com/gbouras13/phold.

Original languageEnglish
Article numbergkaf1448
Number of pages19
JournalNucleic Acids Research
Volume54
Issue number1
DOIs
Publication statusPublished - 13 Jan 2026

Keywords

  • Bioinformatics
  • Computational Biology
  • Bacteriophages

Fingerprint

Dive into the research topics of 'Protein structure-informed bacteriophage genome annotation with Phold'. Together they form a unique fingerprint.

Cite this