TY - JOUR
T1 - Managing Complex Workflows in Bioinformatics
T2 - An Interactive Toolkit with GPU Acceleration
AU - Welivita, Anuradha
AU - Perera, Indika
AU - Meedeniya, Dulani
AU - Wickramarachchi, Anuradha
AU - Mallawaarachchi, Vijini
PY - 2018/7
Y1 - 2018/7
N2 - Bioinformatics research continues to advance at an increasing scale with the help of techniques such as next-generation sequencing and the availability of tool support to automate bioinformatics processes. With this growth, a large amount of biological data gets accumulated at an unprecedented rate, demanding high-performance and high-throughput computing technologies for processing such datasets. Use of hardware accelerators, such as graphics processing units (GPUs) and distributed computing, accelerates the processing of big data in high-performance computing environments. They enable higher degrees of parallelism to be achieved, thereby increasing the throughput. In this paper, we introduce BioWorkflow, an interactive workflow management system to automate the bioinformatics analyses with the capability of scheduling parallel tasks with the use of GPU-accelerated and distributed computing. This paper describes a case study carried out to evaluate the performance of a complex workflow with branching executed by BioWorkflow. The results indicate the gains of ×2.89 magnitude by utilizing GPUs and gains in speed by average ×2.832 magnitude (over n = 5 scenarios) by parallel execution of graph nodes during multiple sequence alignment calculations. Combined speed-ups are achieved ×1.71 for complex workflows. This confirms the expected higher speed-ups when having parallelism through GPU-acceleration and concurrent execution of workflow nodes than the mainstream sequential workflow execution. The tool also provides a comprehensive user interface with better interactivity for managing complex workflows; a system usability scale score of 82.9 is confirmed high usability for the system.
AB - Bioinformatics research continues to advance at an increasing scale with the help of techniques such as next-generation sequencing and the availability of tool support to automate bioinformatics processes. With this growth, a large amount of biological data gets accumulated at an unprecedented rate, demanding high-performance and high-throughput computing technologies for processing such datasets. Use of hardware accelerators, such as graphics processing units (GPUs) and distributed computing, accelerates the processing of big data in high-performance computing environments. They enable higher degrees of parallelism to be achieved, thereby increasing the throughput. In this paper, we introduce BioWorkflow, an interactive workflow management system to automate the bioinformatics analyses with the capability of scheduling parallel tasks with the use of GPU-accelerated and distributed computing. This paper describes a case study carried out to evaluate the performance of a complex workflow with branching executed by BioWorkflow. The results indicate the gains of ×2.89 magnitude by utilizing GPUs and gains in speed by average ×2.832 magnitude (over n = 5 scenarios) by parallel execution of graph nodes during multiple sequence alignment calculations. Combined speed-ups are achieved ×1.71 for complex workflows. This confirms the expected higher speed-ups when having parallelism through GPU-acceleration and concurrent execution of workflow nodes than the mainstream sequential workflow execution. The tool also provides a comprehensive user interface with better interactivity for managing complex workflows; a system usability scale score of 82.9 is confirmed high usability for the system.
KW - Bioinformatics software
KW - biological data analysis
KW - complex workflows
KW - GPU acceleration
KW - tool support
UR - http://www.scopus.com/inward/record.url?scp=85047010292&partnerID=8YFLogxK
U2 - 10.1109/TNB.2018.2837122
DO - 10.1109/TNB.2018.2837122
M3 - Article
C2 - 29994533
AN - SCOPUS:85047010292
SN - 1536-1241
VL - 17
SP - 199
EP - 208
JO - IEEE Transactions on Nanobioscience
JF - IEEE Transactions on Nanobioscience
IS - 3
ER -