What the protein!? Computational methods for predicting microbial protein functions

Research output: Working paper/PreprintPreprint

45 Downloads (Pure)


The identification of protein functions is crucial for understanding microbial life at a molecular scale. While computational methods for annotating protein sequences have greatly advanced in recent years, 30% of all bacterial and 65% of all viral protein sequences cannot be attributed a known biological function. As a result, protein function inference remains a fundamental challenge in computational biology. This paper reviews various bioinformatics methods for annotating microbial and viral proteins, categorised into homology-based and homology-free approaches. Widely used homology-based methods encompass sequence similarity searches such as BLAST and profile hidden Markov models, both of which compare novel protein sequences to databases of protein sequences with known functions. These homology-based methods have limitations, particularly for viral sequences which are severely underrepresented in protein sequence databases. As a result, homology-free methods, including numerical feature extraction, language-based models, guilt-by-association, and protein structure prediction software, offer potential alternatives. In addition, it is also important to critically consider the functional labels used to describe protein functions, and the hierarchical organisation of functional labels, regardless of the annotation method implemented. This review highlights that a combination of multiple functional prediction strategies, including machine learning, may provide the best improvements for microbial protein annotation and alleviate the ever-expanding sequence-function gap affecting microbial proteins. Overall, we provide experimental biologists with a comprehensive overview of annotation methods and inform computational scientists of open challenges and future research avenues.
Original languageEnglish
PublisherOSF Preprints
Number of pages32
Publication statusSubmitted - 27 Apr 2023


  • microbiology
  • protein function prediction
  • sequence annotation
  • machine learning
  • proteomics


Dive into the research topics of 'What the protein!? Computational methods for predicting microbial protein functions'. Together they form a unique fingerprint.

Cite this