Protein sequence comparison based on K-string dictionary

Chenglong Yu, Rong He, Stephen Yau

    Research output: Contribution to journalArticlepeer-review

    34 Citations (Scopus)

    Abstract

    The current K-string-based protein sequence comparisons require large amounts of computer memory because the dimension of the protein vector representation grows exponentially with K. In this paper, we propose a novel concept, the ". K-string dictionary", to solve this high-dimensional problem. It allows us to use a much lower dimensional K-string-based frequency or probability vector to represent a protein, and thus significantly reduce the computer memory requirements for their implementation. Furthermore, based on this new concept, we use Singular Value Decomposition to analyze real protein datasets, and the improved protein vector representation allows us to obtain accurate gene trees.

    Original languageEnglish
    Pages (from-to)250-256
    Number of pages7
    JournalGene
    Volume529
    Issue number2
    DOIs
    Publication statusPublished - 25 Oct 2013

    Keywords

    • Cardinality
    • Frequency vector
    • K-string
    • Sequence comparison
    • Singular Value Decomposition

    Fingerprint

    Dive into the research topics of 'Protein sequence comparison based on K-string dictionary'. Together they form a unique fingerprint.

    Cite this