TY - JOUR
T1 - Protein structure-informed bacteriophage genome annotation with Phold
AU - Bouras, George
AU - Grigson, Susanna R.
AU - Mirdita, Milot
AU - Heinzinger, Michael
AU - Papudeshi, Bhavya
AU - Mallawaarachchi, Vijini
AU - Green, Renee
AU - Kim, Rachel Seongeun
AU - Mihalia, Victor
AU - Psaltis, Alkis James
AU - Wormald, Peter John
AU - Vreugde, Sarah
AU - Steinegger, Martin
AU - Edwards, Robert A.
PY - 2026/1/13
Y1 - 2026/1/13
N2 - Bacteriophage (phage) genome annotation is essential for understanding their functional potential and suitability for use as therapeutic agents. Here, we introduce Phold, an annotation framework utilizing protein structural information that combines the ProstT5 protein language model and structural alignment tool Foldseek. Phold assigns annotations using a database of over 1.36 million predicted phage protein structures with high-quality functional labels. Benchmarking reveals that Phold outperforms existing sequence-based homology approaches in functional annotation sensitivity whilst maintaining speed, consistency, and scalability. Applying Phold to diverse cultured and metagenomic phage genomes shows it consistently annotates over 50% of genes on an average phage and 40% on an average archaeal virus. Comparisons of phage protein structures to other protein structures across the tree of life reveal that phage proteins commonly have structural homology to proteins shared across the tree of life, particularly those that have nucleic acid metabolism and enzymatic functions. Phold is available as free and open-source software at https://github.com/gbouras13/phold.
AB - Bacteriophage (phage) genome annotation is essential for understanding their functional potential and suitability for use as therapeutic agents. Here, we introduce Phold, an annotation framework utilizing protein structural information that combines the ProstT5 protein language model and structural alignment tool Foldseek. Phold assigns annotations using a database of over 1.36 million predicted phage protein structures with high-quality functional labels. Benchmarking reveals that Phold outperforms existing sequence-based homology approaches in functional annotation sensitivity whilst maintaining speed, consistency, and scalability. Applying Phold to diverse cultured and metagenomic phage genomes shows it consistently annotates over 50% of genes on an average phage and 40% on an average archaeal virus. Comparisons of phage protein structures to other protein structures across the tree of life reveal that phage proteins commonly have structural homology to proteins shared across the tree of life, particularly those that have nucleic acid metabolism and enzymatic functions. Phold is available as free and open-source software at https://github.com/gbouras13/phold.
KW - Bioinformatics
KW - Computational Biology
KW - Bacteriophages
UR - http://www.scopus.com/inward/record.url?scp=105026840654&partnerID=8YFLogxK
UR - http://purl.org/au-research/grants/ARC/DP250103825
UR - http://purl.org/au-research/grants/ARC/FL250100019
U2 - 10.1093/nar/gkaf1448
DO - 10.1093/nar/gkaf1448
M3 - Article
C2 - 41495893
AN - SCOPUS:105026840654
SN - 0305-1048
VL - 54
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 1
M1 - gkaf1448
ER -