TY - JOUR
T1 - Phylogenetic Tree Construction Using K-Mer Forest- Based Distance Calculation
AU - Gamage, Gihan
AU - Gimhana, Nadeeshan
AU - Perera, Indika
AU - Bandara, Shanaka
AU - Pathirana, Thilina
AU - Wickramarachchi, Anuradha
AU - Mallawaarachchi, Vijini
PY - 2020/6/19
Y1 - 2020/6/19
N2 - Phylogenetics is one of the dominant data engineering research disciplines based on biological information. More particularly here, we consider raw DNA sequences and do comparative analysis in order to come up with meaningful conclusions. When representing evolutionary relationships among different organisms in a concise manner, the phylogenetic tree helps significantly. When constructing phylogenetic trees, the elementary step is to calculate the genetic distance among species. Alignment-based sequencing and alignment-free sequencing are the two leading distance computation methods that are used to find genetic relatedness of different species. In this paper, we propose a novel alignment-free, pairwise, distance calculation method based on k-mers and a state of art machine learning-based phylogenetic tree construction mechanism. With the proposed approach, we can convert longer DNA sequences into compendious k-mer forests which gear up the efficiency of comparison. Later we construct the phylogenetic tree based on calculated distances with the help of an algorithm build upon k-medoid clustering, which guaranteed significant efficiency and accuracy compared to traditional phylogenetic tree construction methods.
AB - Phylogenetics is one of the dominant data engineering research disciplines based on biological information. More particularly here, we consider raw DNA sequences and do comparative analysis in order to come up with meaningful conclusions. When representing evolutionary relationships among different organisms in a concise manner, the phylogenetic tree helps significantly. When constructing phylogenetic trees, the elementary step is to calculate the genetic distance among species. Alignment-based sequencing and alignment-free sequencing are the two leading distance computation methods that are used to find genetic relatedness of different species. In this paper, we propose a novel alignment-free, pairwise, distance calculation method based on k-mers and a state of art machine learning-based phylogenetic tree construction mechanism. With the proposed approach, we can convert longer DNA sequences into compendious k-mer forests which gear up the efficiency of comparison. Later we construct the phylogenetic tree based on calculated distances with the help of an algorithm build upon k-medoid clustering, which guaranteed significant efficiency and accuracy compared to traditional phylogenetic tree construction methods.
KW - Genetic distance
KW - Genetic relatedness
KW - K-medoid clustering
KW - K-mer forest
KW - Phylogenetics
UR - http://www.scopus.com/inward/record.url?scp=85089034375&partnerID=8YFLogxK
U2 - 10.3991/ijoe.v16i07.13807
DO - 10.3991/ijoe.v16i07.13807
M3 - Article
AN - SCOPUS:85089034375
VL - 16
SP - 4
EP - 20
JO - International Journal of Online and Biomedical Engineering
JF - International Journal of Online and Biomedical Engineering
IS - 7
ER -