SlideShare a Scribd company logo
International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010
DOI : 10.5121/ijdps.2010.1211 129
FINE GRAIN PARALLEL CONSTRUCTION
OF NEIGHBOUR-JOINING PHYLOGENETIC
TREES WITH REDUCED REDUNDANCY
USING MULTITHREADING
Biswajit Sahoo1
, Ashutosh Behura2
, and Sudarsan Padhy3
1
KIIT University, Department of CSE, Bhubaneswar, India
sh_biswajit@yahoo.co.in
2
KIIT University, Department of CSE, Bhubaneswar, India
ashutoshbehura_2006@yahoo.com
3
Utkal University, Department of Mathematics, Bhubaneswar, India
spadhy04@yahoo.co.in
ABSTRACT
In biological research, scientists often need to use the information of the species to infer the evolutionary
relationship among them. The evolutionary relationships are generally represented by a labeled binary
tree, called the evolutionary tree (or phylogenetic tree). The phylogeny problem is computationally
intensive, and thus it is suitable for parallel computing environment. In this paper, a fast algorithm for
constructing Neighbor-Joining phylogenetic trees has been developed. The CPU time is drastically
reduced as compared with sequential algorithms. The new algorithm includes three techniques: Firstly, a
linear array A[N] is introduced to store the sum of every row of the distance matrix (the same as SK),
which can eliminate many repeated (redundancy) computations, and the value of A[i] are computed only
once at the beginning of the algorithm, and are updated by three elements in the iteration. Secondly, a
very compact formula for the sum of all the branch lengths of OTUs (Operational Taxonomic Units) i and
j has been designed. Thirdly, multiple parallel threads are used for computation of nearest neighboring
pair.
INDEX TERM
Pair wise alignment, operational taxonomic unit, phylogenetic tree, multiple sequence alignment.
1. INTRODUCTION
An understanding of evolutionary relationships is at the heart of modern pharmaceutical research
for drug discovery, and is also the basis for the design of genetically enhanced organisms.
Evolutionary history is typically represented by an evolutionary tree [13-15]. An evolutionary
tree is a leaf-labeled binary tree which tracks the genetic similarities of a set of closely related
species.
The task of constructing the evolutionary tree for a set of species is known as the phylogeny
problem. The difficulty of such problem is that the number of possible evolutionary trees is very
large. Construction of an evolutionary history for a set of contemporary taxa based on their
pairwise distance is computationally intractable (i.e., NP-complete) for various optimality
International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010
130
criteria [2, 3]. Existing methods for solving the phylogeny problem include parsimony,
maximum likelihood, and distance matrix methods [15-17].Various heuristics have been
proposed to search for solutions of desired quality, and the majority of these methods are greedy.
Among the greedy approaches, the Neighbor-Joining (NJ) method [4, 5] is widely used by
molecular biologists due to its efficiency and simplicity.
In recent years, NJ method is very popular especially in multiple sequence alignment (MSA)
because it is applicable to any type of evolutionary distance data. The output of NJ method is
used to direct the grouping of sequences during the multiple alignment process [8]. The most
popular MSA tool CLUSTALW [7] uses NJ algorithm for constructing phylogenetic trees and
the progressive alignment. Most recent famous MSA software such as T-COFFEE [8], MAFFT
[9], MUSCLE [10] et al also use NJ method as main alternative option. But the big-O time
complexity of the original NJ algorithm is O(N5
), where N is the number of OTUs (Operational
Taxonomic Units), which limits the application of NJ algorithm when N is very big. For a part of
this reason, MUSCLE and MAFFT use UPGMA [11] to construct phylogenetic tree since
UPGMA reduces the time complexity to O(N3
). However, UPGMA does not build the true
evolutionary tree to guide a progressive alignment in line with biological expectations though it
may get higher SP score than NJ method sometimes. If the time complexity of NJ method is
reduced, we believe that it will be more welcome by molecular biologists.
Studier and Kepler [5] have succeeded to reduce the time complexity of NJ from O(N5
) to O(N3
).
But there is still a scope to reduce the execution time for NJ algorithm. In this paper, we propose
a parallel algorithm which is verified to be much faster than the original one, and about 2, 2.2,
and 2.3 times faster than Studier and Kepler's algorithm on a single thread, dual thread, and quad
thread respectively by implementing with sufficient experimental data. We also reduce the time
complexity from O(N5
) to O(N3
) by an extra O(N) space in comparison to original NJ algorithm.
In the rest of this paper, sections 2 and 3 describe the sequential NJ algorithm and its time
complexity respectively. In section 4, we propose a fine grain parallel fast algorithm.
Experimental results are given in section 5. Section 6 contains the conclusion of the paper.
2. SEQUENTIAL NJ ALGORITHM
The NJ method was initially proposed by Saitou and Nei [4], and later modified by Studier and
Kepler [5]. Neighbor-Joining seeks to build a tree which minimizes the sum of all branch
lengths, i.e., it adopts the minimum-evolution (ME) criterion. Many studies have corroborated
NJ's performance in reconstructing correct evolutionary trees. For small numbers of taxa, NJ
solutions are likely to be identical to the optimal ME tree [6]. Neighbor-Joining begins with a
star tree, then iteratively finds the nearest neighboring pair (i.e. the pair that induces a tree of
minimum sum of branch lengths) among all possible pairs of nodes (both internal and external).
The nearest pair is then clustered into a new internal node, and the distances of this node to the
rest of the nodes in the tree are computed and used in later iterations. The algorithm terminates
when N-2 internal nodes have been inserted into the tree (i.e., when the star tree is fully resolved
into a binary tree). The framework of NJ is defined by three components: the criterion used to
select pairs of nodes; the formula used to reduce the distance matrix; and the branch length
estimation formula.
International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010
131
Fig.1 shows an example of NJ tree, which indicates the evolutionary relationships of 8 OTUs
and which is an un-rooted tree. The rectangles denote the terminal nodes or OTUs and the
rectangles denote the internal node. The characters beside the triangle denote OTUs, numbers
and the rectangle denote the orders that OTUs are inserted into the tree and the data above the
lines are the branch lengths of two neighbors. It is obvious that OTUs 1 and 2 are clustered first,
generating a new combined OTU. The branch lengths between 1 and the new OTU and 2 and the
new OTU are 5 and 2 respectively. After OTUs 7 and 8 are clustered, there are only two OTUs,
and they are clustered. The tree is built now.
The NJ algorithm is described briefly as follows. Sij is the sum of the least-squares estimates of
branch lengths [4] and is defined by
1
0
1 1 1
( )
2( 2) 2 2
N
ij ik jk ij mn
k m n
S D D D D
N N
−
= <
= + + +
− −
∑ ∑ (1)
where i=0 ∼ N – 2, j=i+1 ∼ N – 1, k,m,n ≠ i, j.
At every iterative step, the minimum Sij indicates that nodes i and j are the nearest evolutionary
neighbors. Suppose that OTUs i and j are the neighbors to be joined in the tree. They are
clustered as a new one, denoted by X. Define the symbol Dij as the edit distance between OTUs i
and j.
The distance between this combined OTU (X) and another OTU k is defined by
DXk = (Dik +Djk )/2, k = 0 ∼ N -1, k ≠ i,j (2)
Lab is defined as the branch length between nodes a and b. LiX and LjX are estimated by
LiX =(Dij+DiZ -DjZ )/2 (3)
LjX = (Dij +DjZ -DiZ )/2
Figure 1. Neighbor-Joining tree
International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010
132
where
1
0
1
0
/ ( 2),
/ ( 2), , .
N
iZ ik jZ
k
N
jk
k
D D N D
D N k i j
−
=
−
=
= −
= − ≠
∑
∑
Here, Z represents a group of OTUs including all but i and j, and Diz and Djz are the distances
between i and Z and j and Z, respectively. At last OTUs i and j are deleted from the current
OTUs lists and X is added to the list, then the number of current OTUs reduces by one and a new
iteration starts. The pseudo code of NJ algorithm is described as follows:
/*Input: N (the number of OTUs) and Dij (the edit distance between pair OTUs i and j),
Distance Matrix, D */
/* Output: NJ tree */
1. While N > 2 do
1.1 For i=0 to N-1 do
1.2 For j=i+1 to N do
1.2.1 Compute Sij according to formula (l).
EndFor
EndFor
1.3 Search the minimum Sij to get neighbors i and j.
1.4 OTUs i and j are clustered as a new OTU X.
1.5 Compute DXk according to formula (2).
1.6 Compute branch lengths LiX and LjX according to formula (3).
1.7 Delete OTUs i and j, and add X to current OTU lists.
1.8 N=N-1.
1.9 End while
2 Cluster the last two OTUs.
3. TIME COMPLEXITY ANALYSIS OF NJ ALGORITHM
According to the pseudo code of NJ algorithm, the time complexity of the outermost loop is
O(N), lines 1.1 and 1.2 are O(N2
), line 1.3 is O(N2
), line 1.4 is O(1), line 1.5 is O(N), line 1.6 is
O(N), line 1.7 is O(1) and line 2 is O(N). As for line 1.2.1, because formula (1) includes three
parts, we will analyze the time complexity of which separately. The first part contributes the
time O(N), the second is O(1) and the third is O(N2
) because we should consider all the m and n
but i and j. So the most expensive step is formula (1), which induces that the total time of NJ is
O(N5
) if we compute it directly. So the key point to optimize NJ algorithm is to optimize the
computation of formula (1).
For the three parts of formula (1), in fact there are many repeated computations of the same data
in part 1 and 3. For example, in part 1, when computing S12, D13 is added; when computing S14,
D13 is added too. In part 3, when computing S12, D34 is added; when computing S15, D34 is added
too. In other words, there are close relations between the computations of Sij. In other words, NJ
International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010
133
algorithm should compute the triangular matrix (Sij) iteratively to search the nearest neighbors.
But it is fortunate that the computations of Sij only need the distance matrix (Dij), which
motivates us to design fast parallel algorithm to avoid the repeated computations of Dij.
4. PARALLEL FAST ALGORITHM FOR CONSTRUCTING NJ TREE
Our motivation is very simple, we try to reduce the number of computations in the inner most
loop (1.2.1), which is the most expensive of the computations for the time complexity of the
whole algorithm. Pre-processing outside the double loops of lines 1.1 to 1.2 of the pseudo code
has been done, the results of which are used for the computations of Sij (line 1.2.1).
1
0
( )
N
i ij
j
A D j i
−
=
= ≠∑ (4)
1
0
N
i
i
total A
−
=
= ∑ (5)
Define a linear array A[N] and a buffer total the former is used to store the sum of every row of
Dij and the later to store all the sum of Dij .Note that (Dij) is a symmetric square matrix but not a
triangular matrix as (Sij). Then the first and third part of Sij sum1 and sum3 can be written
respectively as
( ) ( )1 2 / 2 2
def
i j ijsum A A D N= + − − (6)
3 ( 2( ) 2 ) /2( 2)
def
i j ijsum total A A D N= − + + − (7)
and
1
1 3
2
ij ijS sum D sum= + + (8)
1
( ) / 2( – 2)
2
i j ijtotal A A N D= − − +
Compared with formula (1), this modified formula (8) is simpler and efficient. To compute Ai
and total, the time complexity is O(N2
) and O(N) respectively. And the time complexity of
formula (8) is only O(1).
Proof of equations (6), (7), and (8)
1 1
0 0
( ) ( , )
N N
i ij ik ij ik
k k
A D D k i D D k i j
− −
= =
− = ≠ − = ≠∑ ∑
1 1
0 0
( ) ( , )
N N
j ij jk ij jk
k k
A D D k j D D k i j
− −
= =
− = ≠ − = ≠∑ ∑
(Ai – Dij) + (Aj – Dij) = Ai + Aj – 2Dij
International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010
134
1
0
( )( , )
N
ik jk
k
D D k i j
−
=
= + ≠∑
(2) For sum3,
(Ai + Aj) denotes the sums of rows and cols of i and j respectively. (total – 2 (Ai + Aj )+2Dij)
denotes the sums of the remaining elements of (Dij) except those elements related to i and j. 2Dij
is added because Dij and Dji are deleted twice from rows and cols of i and j. Then
2( ) 2 ( , , ).i j ij mn
m n
total A A D D m n i j
<
− + + = ≠∑
Consider formula (8) again, total and N are constants for every Sij at each iteration, so they need
not be computed to search the minimum Sij, and the constant factor 1/2 and 1/(N – 2) are ignored
too, then we can get a more simple form of Sij (formula (9)) compared with formula (8). We can
search minimum Sij using S′ij where
S′ij =Dij(N – 2) – ( Ai+Aj) (9)
Formula (9) does not include total, which implies that we need not compute the sum of Ai any
more as formula (5) iteratively. This saves O(N) time computation.
Consider the distance matrix (Dij), suppose that OTUs i* and j* are the neighbors to be joined in
the tree, then the two rows and cols of i* and j* are deleted from Dij , and a new row and column
(denoted by X) are added. Then DXk is computed according to formula (2). In fact, we need not
compute Ai again as formula (4), because most values of the remaining Ai are unchanged
between every two successive iteration step. Ax can be computed in O(N) time. For the remaining
Ai' only Dii* and Dij* are deleted, and DiX is added. So
Ai =Ai – Dit* -Dij* +DiX (10)
Which saves O(N) time for each iteration.
And the formulas for DiZ and DjZ in formula (3) are computed as
DiZ =(Ai - Dij)/(N - 2) (11)
DjZ = (Aj -Dij)/(N -2)
Which saves O(N) time compared with formula (3).
Computing Sij for all pair of protein/DNA sequences have been parallelized. As formula (9) for
Sij is a simple one, fine grain parallelism is suitable. The elements of the upper triangular matrix
(Sij) are mapped onto a one dimensional array for equal load distribution among the threads. The
indexes i and j of the matrix (Sij) can be mapped onto the index r of the one dimensional array
where
r = (i – 1) (2n – i) / 2 + j – i
International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010
135
Step 1.2 of our algorithm uses master-slave parallelization method to compute minimum Sij. In
each iteration, master thread creates multiple slave threads. Each thread calculates minimum Sij
independently. Then master thread waits till all slaves complete finding their respective local
minimum. The number of Sijs’ computed by each thread (both master and slaves) are evenly
distributed among them. Finally, master declares the global minimum Sij and continues
computation of Dkx, Lix, Ljx, Ai and Ax. The master thread repeats the above steps till N–2 number
of internal nodes are inserted.
Now the new algorithm is described as follows: Compute Ai according to formula (4).
1 While N > 2 do
1.0 For i=1 to N – 1 do
1.1 For j=i+1 to N do
r = (i – 1) (2n – i) / 2 + j – i
/*map the indexes i and j of upper triangular matrix (Sij) to a linear array index r*/
EndFor;
EndFor;
/* Step 1.2: Find minimum Sij by parallel threads (both master and slave) */
1.2 For i=1 to (N × (N – 1)) / (2p) do parallel
/* p is the number of processes */
Compute Sij according to formula (9).
Find local minimum Sij
EndFor;
1.3 Search the global minimum Sij from all local minimum to get neighbors i* and j*
by the master thread
/* Step 2: Update Dkx, Lix, Ljx, Ai and Ax by master thread*/
1.4 OTUs i* and j* are clustered as a new OTU X
1.5 Compute DXk according to formula (2).
1.6 Compute branch lengths LiX and LjX according to formulas (3, 11).
1.7 Delete OTUs i* and j*, and add X to current OTU lists.
1.8 Update the remaining Ai according to formula (10) and compute Ax.
1.9 N=N-1.
EndWhile
2 Cluster the last two OTUs.
The time complexity of the new algorithm is O(N3
).
The implementation of multithreaded fine grain parallel algorithm can combine with either
OpenMP, Pthreads or thread libraries. We implemented the fine grain parallel portion using
Pthread. In our parallel algorithm, a scheduling strategy called fixed-size chunking [18] where
tasks of one fixed size are to be allocated to available cores.
5. EXPERIMENT RESULTS:
We implement our algorithm in C programming language and utilize the Pthread library of
Linux Operating System. The experiment was conducted in a computer system having 2.0 GHz
Quad core Xeon CPU with 16 GB of primary memory and 12 MB cache memory. This computer
runs Linux OS with no other applications installed. All measured times include the times that
International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010
136
Figure 2. Elapsed time for SK sequential(SK), our single(1T), dual(2T), and quad(4T) threads
implementation of Neighbor Joining algorithm as a function of number of sequences.
were taken to read the input data from a file and write the solution into a file. Furthermore, all
measured times were measured when there is no one using the system except this experiment.
The distance matrix (Dij) is generated by three methods: (1) Global pairwise alignments by
dynamic programming, which is slowest but the most accurate. (2) K-mer distance [10, 12],
which is linear time complexity for pairwise distance estimation. (3) Random generation by
computer pseudo number, which is just to provide a fast method to generate data.
Experiments measured the elapsed times for SK and our proposed single (1T), dual (2T), and
quad (4T) thread implementation of NJ algorithm as a function of the number of sequences as
shown in the Table 1. Number of sequences varies from 300 to 900 in our experience. All
measured times are in millisecond.
Table 1
Elapsed times of our NJ phylogenetic trees construction algorithm running in single(1T)
dual(2T), quad(4T) threads and SK algorithm implementation with respect to no. of sequences.
Times are in millisecond.
300 400 500 600 700 800 900
SK 202 368 710 1190 1805 2590 3610
IT 130 230 390 620 930 1330 1840
2T 120 205 350 550 820 1150 1610
4T 110 170 330 510 770 1105 1535
International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010
137
Figure 3. Speedup of our Neighbor Joining algorithms running in single(1T), dual(2T), and
quad(4T) threads comparison to SK as a function of number of sequences.
Figure 2 shows the elapsed times for the sequential SK and our threaded (single, dual and quad)
Neighbor joining phylogenetic tree algorithm as a function of number of sequences. All those
threaded algorithms successfully reduce the execution time.
Figure 3 shows the speedup for the single, dual, and quad thread execution of the proposed
algorithm as a function of number of sequences as compared to SK algorithm. It can be observed
that higher speedup can be achieved with more number of sequences. This happens due to the
overhead of thread creation, which is fixed for any number of sequences.
Now, we focus on the speedup as a function of number of threads. In Figure 4, the speedup
increases from single thread to dual threads and from dual threads to quad threads. However,
speedup may fall if both the numbers of sequences and number of thread do not increase
proportionately. This is because of the overhead incurred in creating threads. The speedup seems
to be lower than steady state when executed using large number of threads.
The proposed method fully utilizes the multithreading feature along with reducing redundant
computation in comparison to sequential SK algorithm. This achieves better speedup especially
when the number of sequences is large. This is compatible with the massive work for the tree
construction and gains more throughput.
International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010
138
Figure 4. Speedup of our Neighbor Joining results for different number of
sequences in comparison to SK as a function of number of threads.
6. CONCLUSIONS
In this paper, a fast parallel algorithm for constructing Neighbor – Joining phylogenetic trees has
been developed. The new algorithm has O(N3
) time complexity compared with O(N5
) of SN, the
same as SK. Further a fine grain parallelism is incorporated using multiple threads in the inner
most loop. The experimental results show that running time of our algorithm using a single
thread is from tens to hundreds times faster than SN and about two times faster than SK for large
N. It is also verified that Speedup of our algorithm using dual threads is 2.2 times faster and that
using quad threads is 2.3 times faster than SK algorithm on a desktop computer. The key point of
our algorithm is to reduce the repeated computations and to parallelize the computation. We tried
the above computation in the inner most loop.
REFERENCES
[1] C. Yang and S. Khuri, "PTC: An interactive tool for phylogenetic tree construction," in IEEE Proc.
of the Computational Systems Bioinformatics, Stanford University, 2003,8, pp. 476-477.
[2] L. R. Foulds and R. L. Graham, "The steiner problem in phylogeny is NP-complete," Advances
Appl. Math. vol. 3, 1982, pp. 43-49.
[3] W. Day, "Computational complexity of inferring phylogenies from dissimilarity matrices," Bull.
Math. BioI., voI.49,no.4, 1987, pp.461-467.
[4] N. Saitou, and M. Nei, "The n.eighbor-joining method: a new method for reconstructing
phylogenetic trees," Mol. BioI. Evol., vol. 4, no. 4, 1987, pp. 406-425.
International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010
139
[5] J. Studier and K. Keppler, "A note on the neighbor-joining algorithm of Saitou and Nei," Mol. BioI.
Evol., vol. 5, no. 6, 1988, pp. 729-731.
[6] N. Saitou and T. Imanishi, "Relative efficiencies of the Fitch-Margoliash, maximum-parsimony,
Maximum likelihood, minimum-evolution, and neighbor-joining methods of phylogenetic tree
construction in obtaining the correct tree," Mol. BioI. Evol., vol. 6, no. 5, 1989, pp. 514-525.
[7] J.D. Thompson, D.G. Higgins, and T. J. Gibson, "CLUSTAL W: improving the sensitivity of
progressive multiple sequence alignment through sequence weighting, position-specific gap
penalties and weight matrix choice," Nucleic Acids Res., vol. 22, no. 22, 1994, pp. 4673-4680.
[8] C. Notredame, D.G. Higgins, and J. Heringa, "T-Coffee: a novel method for fast and accurate
multiple sequence alignment," J. Mol. BioI., vol. 302, 2000, pp. 205-217.
[9] K. Katoh, K. Misawa, K. Kuma and T. Miyata, "MAFFT: a novel method for rapid multiple
sequence alignment based on fast Fourier transform," Nucleic Acids Res., vol. 30, no. 14, 2002,
pp.3059-3066.
[10] R.C. Edgar, "MUSCLE: multiple sequence alignment with high accuracy and high throughput,"
Nucleic Acids Res., vol. 32, no. 5, Mar. 2004, pp. 1792-1797.
[11] P.H.A. Sneath and R.R. Sokal, Numerical Taxonomy. Freeman, San Francisco, 1973.
[12] R.C. Edgar, "Local homology recognition-, Jan. and distance measures in linear time using
compressed amino acid alphabets," Nucleic Acids Res., vol. 32, no. 1, Jan. 2004, pp.380-385.
[13] H. O. Andrzej Lingas and A. Ostlin. Efficient merging and construction of evolutionary trees.
Journal of Algorithms, volume 41, 2001, pages 41–51.
[14] M. Csuros. Fast recovery of evolutionary trees with thousands of nodes. In the proceeding of the
International Conference on Research in Computational Molecular Biology, number 01, page 2001,
1997.
[15] R. C. T. Lee. Computational biology. Department of Computer Science and Information
Engineering, National Chi-Nan University, 2001.
[16] L. A. Salter. Algorithms for phylogenetic tree reconstruction. In Proceeding of the International
Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences,
volume 2, 2000, pages 459–465.
[17] K. Strimmer and A. von Haeseler. Quartet puzzling: A quartet maximum-likelihood method for
reconstructing tree topologies. In Molecular Biology and Evolution, volume 13(7), 1996, pages
964–969.
[18] T. Hagerup. Allocating independent tasks to parallel processors: an experimental strudy. Journal of
Parallel and Distributed Computing, 1997, Vol. 47, pp. 185-197.
Authors :
Biswajit Sahoo received Bachelor Degree in Electronics & Communication Engineering
from NIT Durgapur, India, M.Tech Degree in Computer Engineering from B.E College,
Shibpur & pursuing Ph.D. in Computer Science under Utkal University, India. He is
presently working as Associate Professor Department of Computer Science &
Engineering, KIIT University, Bhubaneswar, India. He has published papers in the areas
relating to Parallel Algorithms and Bio Informatics.
International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010
140
Ashutosh Behura received Bachelor degree in Computer Science and Engineering from
KIIT University, Bhubaneswar, India. He is presently working as Research Associate in
the Department of Computer Science & Engineering, KIIT University, Bhubaneswar,
India. His area of interest is Parallel Algorithms and Bio Informatics.
Sudarsan Padhy is a Professor of Mathematics at Utkal University, India. He obtained
his Ph.D. degree in Mathematics from Utkal University in 1979 and Postdoctoral
research at University of Freiburg, Germany during 1980-81. He has over fifty published
research papers and five books to his credit extending over Fluid dynamics, Finite
difference and Finite element method for solving partial differential equations,
Operation research, Parallel algorithms, Computational finance and Computational
biology.

More Related Content

PDF
PAGOdA paper
PDF
FUZZY ROUGH INFORMATION MEASURES AND THEIR APPLICATIONS
PDF
PDF
Compressing the dependent elements of multiset
PDF
Neutrosophic soft matrices and its application in medical diagnosis
PDF
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
PDF
Ijetcas14 536
PAGOdA paper
FUZZY ROUGH INFORMATION MEASURES AND THEIR APPLICATIONS
Compressing the dependent elements of multiset
Neutrosophic soft matrices and its application in medical diagnosis
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
Ijetcas14 536

What's hot (18)

PDF
[IJCT-V3I2P19] Authors: Jieyin Mai, Xiaojun Li
PDF
TIME-DOMAIN MODELING OF ELECTROMAGNETIC WAVE PROPAGATION IN COMPLEX MATERIALS...
PDF
Paper id 71201925
PDF
A brief history of generative models for power law and lognormal ...
PDF
CHN and Swap Heuristic to Solve the Maximum Independent Set Problem
PDF
Talk slides imsct2016
PDF
PDF
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logic
PPTX
Expectation Maximization Algorithm with Combinatorial Assumption
PDF
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
PDF
ON APPROACH TO INCREASE DENSITY OF FIELD- EFFECT TRANSISTORS IN AN INVERTER C...
PDF
Numerical solution of fuzzy hybrid differential equation by third order runge...
PDF
11.numerical solution of fuzzy hybrid differential equation by third order ru...
PDF
11.[8 17]numerical solution of fuzzy hybrid differential equation by third or...
PDF
Structural, electronic, elastic, optical and thermodynamical properties of zi...
PDF
Topology of charge density from pseudopotential density functional theory cal...
PDF
Calculation of resonant frequency of hexagonal
PDF
Accelerating materials property predictions using machine learning
[IJCT-V3I2P19] Authors: Jieyin Mai, Xiaojun Li
TIME-DOMAIN MODELING OF ELECTROMAGNETIC WAVE PROPAGATION IN COMPLEX MATERIALS...
Paper id 71201925
A brief history of generative models for power law and lognormal ...
CHN and Swap Heuristic to Solve the Maximum Independent Set Problem
Talk slides imsct2016
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logic
Expectation Maximization Algorithm with Combinatorial Assumption
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
ON APPROACH TO INCREASE DENSITY OF FIELD- EFFECT TRANSISTORS IN AN INVERTER C...
Numerical solution of fuzzy hybrid differential equation by third order runge...
11.numerical solution of fuzzy hybrid differential equation by third order ru...
11.[8 17]numerical solution of fuzzy hybrid differential equation by third or...
Structural, electronic, elastic, optical and thermodynamical properties of zi...
Topology of charge density from pseudopotential density functional theory cal...
Calculation of resonant frequency of hexagonal
Accelerating materials property predictions using machine learning
Ad

Similar to FINE GRAIN PARALLEL CONSTRUCTION OF NEIGHBOUR-JOINING PHYLOGENETIC TREES WITH REDUCED REDUNDANCY USING MULTITHREADING (20)

PPTX
Algorithm research project neighbor joining
PDF
International Journal of Computer Science and Security Volume (2) Issue (5)
PDF
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
PDF
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
PDF
EVE161: Microbial Phylogenomics - Class 4 - Phylogeny
PPTX
Tools in phylogeny
PPT
Plant Molecular Systematics Phylogenetics.ppt
PPT
phylogenetics (1)...............................ppt
PPT
distance based phylogenetics-methodology
PPTX
PDF
A NEW PARALLEL ALGORITHM FOR COMPUTING MINIMUM SPANNING TREE
PDF
A NEW PARALLEL ALGORITHM FOR COMPUTING MINIMUM SPANNING TREE
PPTX
Distance based method
PPT
Phylogenetics1
PPT
Multiple Sequence Alignment-just glims of viewes on bioinformatics.
PPTX
philogenetic tree
PPTX
BTC 506 Phylogenetic Analysis.pptx
PDF
Phylogenetic analysis
PPTX
Tree building
PPTX
Bioinformatics presentation shabir .pptx
Algorithm research project neighbor joining
International Journal of Computer Science and Security Volume (2) Issue (5)
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
EVE161: Microbial Phylogenomics - Class 4 - Phylogeny
Tools in phylogeny
Plant Molecular Systematics Phylogenetics.ppt
phylogenetics (1)...............................ppt
distance based phylogenetics-methodology
A NEW PARALLEL ALGORITHM FOR COMPUTING MINIMUM SPANNING TREE
A NEW PARALLEL ALGORITHM FOR COMPUTING MINIMUM SPANNING TREE
Distance based method
Phylogenetics1
Multiple Sequence Alignment-just glims of viewes on bioinformatics.
philogenetic tree
BTC 506 Phylogenetic Analysis.pptx
Phylogenetic analysis
Tree building
Bioinformatics presentation shabir .pptx
Ad

Recently uploaded (20)

PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Welding lecture in detail for understanding
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Construction Project Organization Group 2.pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
composite construction of structures.pdf
PPTX
Sustainable Sites - Green Building Construction
PPTX
additive manufacturing of ss316l using mig welding
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPT
Mechanical Engineering MATERIALS Selection
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
Model Code of Practice - Construction Work - 21102022 .pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Internet of Things (IOT) - A guide to understanding
Welding lecture in detail for understanding
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Construction Project Organization Group 2.pptx
bas. eng. economics group 4 presentation 1.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
composite construction of structures.pdf
Sustainable Sites - Green Building Construction
additive manufacturing of ss316l using mig welding
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Mechanical Engineering MATERIALS Selection
Operating System & Kernel Study Guide-1 - converted.pdf
Foundation to blockchain - A guide to Blockchain Tech

FINE GRAIN PARALLEL CONSTRUCTION OF NEIGHBOUR-JOINING PHYLOGENETIC TREES WITH REDUCED REDUNDANCY USING MULTITHREADING

  • 1. International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010 DOI : 10.5121/ijdps.2010.1211 129 FINE GRAIN PARALLEL CONSTRUCTION OF NEIGHBOUR-JOINING PHYLOGENETIC TREES WITH REDUCED REDUNDANCY USING MULTITHREADING Biswajit Sahoo1 , Ashutosh Behura2 , and Sudarsan Padhy3 1 KIIT University, Department of CSE, Bhubaneswar, India sh_biswajit@yahoo.co.in 2 KIIT University, Department of CSE, Bhubaneswar, India ashutoshbehura_2006@yahoo.com 3 Utkal University, Department of Mathematics, Bhubaneswar, India spadhy04@yahoo.co.in ABSTRACT In biological research, scientists often need to use the information of the species to infer the evolutionary relationship among them. The evolutionary relationships are generally represented by a labeled binary tree, called the evolutionary tree (or phylogenetic tree). The phylogeny problem is computationally intensive, and thus it is suitable for parallel computing environment. In this paper, a fast algorithm for constructing Neighbor-Joining phylogenetic trees has been developed. The CPU time is drastically reduced as compared with sequential algorithms. The new algorithm includes three techniques: Firstly, a linear array A[N] is introduced to store the sum of every row of the distance matrix (the same as SK), which can eliminate many repeated (redundancy) computations, and the value of A[i] are computed only once at the beginning of the algorithm, and are updated by three elements in the iteration. Secondly, a very compact formula for the sum of all the branch lengths of OTUs (Operational Taxonomic Units) i and j has been designed. Thirdly, multiple parallel threads are used for computation of nearest neighboring pair. INDEX TERM Pair wise alignment, operational taxonomic unit, phylogenetic tree, multiple sequence alignment. 1. INTRODUCTION An understanding of evolutionary relationships is at the heart of modern pharmaceutical research for drug discovery, and is also the basis for the design of genetically enhanced organisms. Evolutionary history is typically represented by an evolutionary tree [13-15]. An evolutionary tree is a leaf-labeled binary tree which tracks the genetic similarities of a set of closely related species. The task of constructing the evolutionary tree for a set of species is known as the phylogeny problem. The difficulty of such problem is that the number of possible evolutionary trees is very large. Construction of an evolutionary history for a set of contemporary taxa based on their pairwise distance is computationally intractable (i.e., NP-complete) for various optimality
  • 2. International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010 130 criteria [2, 3]. Existing methods for solving the phylogeny problem include parsimony, maximum likelihood, and distance matrix methods [15-17].Various heuristics have been proposed to search for solutions of desired quality, and the majority of these methods are greedy. Among the greedy approaches, the Neighbor-Joining (NJ) method [4, 5] is widely used by molecular biologists due to its efficiency and simplicity. In recent years, NJ method is very popular especially in multiple sequence alignment (MSA) because it is applicable to any type of evolutionary distance data. The output of NJ method is used to direct the grouping of sequences during the multiple alignment process [8]. The most popular MSA tool CLUSTALW [7] uses NJ algorithm for constructing phylogenetic trees and the progressive alignment. Most recent famous MSA software such as T-COFFEE [8], MAFFT [9], MUSCLE [10] et al also use NJ method as main alternative option. But the big-O time complexity of the original NJ algorithm is O(N5 ), where N is the number of OTUs (Operational Taxonomic Units), which limits the application of NJ algorithm when N is very big. For a part of this reason, MUSCLE and MAFFT use UPGMA [11] to construct phylogenetic tree since UPGMA reduces the time complexity to O(N3 ). However, UPGMA does not build the true evolutionary tree to guide a progressive alignment in line with biological expectations though it may get higher SP score than NJ method sometimes. If the time complexity of NJ method is reduced, we believe that it will be more welcome by molecular biologists. Studier and Kepler [5] have succeeded to reduce the time complexity of NJ from O(N5 ) to O(N3 ). But there is still a scope to reduce the execution time for NJ algorithm. In this paper, we propose a parallel algorithm which is verified to be much faster than the original one, and about 2, 2.2, and 2.3 times faster than Studier and Kepler's algorithm on a single thread, dual thread, and quad thread respectively by implementing with sufficient experimental data. We also reduce the time complexity from O(N5 ) to O(N3 ) by an extra O(N) space in comparison to original NJ algorithm. In the rest of this paper, sections 2 and 3 describe the sequential NJ algorithm and its time complexity respectively. In section 4, we propose a fine grain parallel fast algorithm. Experimental results are given in section 5. Section 6 contains the conclusion of the paper. 2. SEQUENTIAL NJ ALGORITHM The NJ method was initially proposed by Saitou and Nei [4], and later modified by Studier and Kepler [5]. Neighbor-Joining seeks to build a tree which minimizes the sum of all branch lengths, i.e., it adopts the minimum-evolution (ME) criterion. Many studies have corroborated NJ's performance in reconstructing correct evolutionary trees. For small numbers of taxa, NJ solutions are likely to be identical to the optimal ME tree [6]. Neighbor-Joining begins with a star tree, then iteratively finds the nearest neighboring pair (i.e. the pair that induces a tree of minimum sum of branch lengths) among all possible pairs of nodes (both internal and external). The nearest pair is then clustered into a new internal node, and the distances of this node to the rest of the nodes in the tree are computed and used in later iterations. The algorithm terminates when N-2 internal nodes have been inserted into the tree (i.e., when the star tree is fully resolved into a binary tree). The framework of NJ is defined by three components: the criterion used to select pairs of nodes; the formula used to reduce the distance matrix; and the branch length estimation formula.
  • 3. International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010 131 Fig.1 shows an example of NJ tree, which indicates the evolutionary relationships of 8 OTUs and which is an un-rooted tree. The rectangles denote the terminal nodes or OTUs and the rectangles denote the internal node. The characters beside the triangle denote OTUs, numbers and the rectangle denote the orders that OTUs are inserted into the tree and the data above the lines are the branch lengths of two neighbors. It is obvious that OTUs 1 and 2 are clustered first, generating a new combined OTU. The branch lengths between 1 and the new OTU and 2 and the new OTU are 5 and 2 respectively. After OTUs 7 and 8 are clustered, there are only two OTUs, and they are clustered. The tree is built now. The NJ algorithm is described briefly as follows. Sij is the sum of the least-squares estimates of branch lengths [4] and is defined by 1 0 1 1 1 ( ) 2( 2) 2 2 N ij ik jk ij mn k m n S D D D D N N − = < = + + + − − ∑ ∑ (1) where i=0 ∼ N – 2, j=i+1 ∼ N – 1, k,m,n ≠ i, j. At every iterative step, the minimum Sij indicates that nodes i and j are the nearest evolutionary neighbors. Suppose that OTUs i and j are the neighbors to be joined in the tree. They are clustered as a new one, denoted by X. Define the symbol Dij as the edit distance between OTUs i and j. The distance between this combined OTU (X) and another OTU k is defined by DXk = (Dik +Djk )/2, k = 0 ∼ N -1, k ≠ i,j (2) Lab is defined as the branch length between nodes a and b. LiX and LjX are estimated by LiX =(Dij+DiZ -DjZ )/2 (3) LjX = (Dij +DjZ -DiZ )/2 Figure 1. Neighbor-Joining tree
  • 4. International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010 132 where 1 0 1 0 / ( 2), / ( 2), , . N iZ ik jZ k N jk k D D N D D N k i j − = − = = − = − ≠ ∑ ∑ Here, Z represents a group of OTUs including all but i and j, and Diz and Djz are the distances between i and Z and j and Z, respectively. At last OTUs i and j are deleted from the current OTUs lists and X is added to the list, then the number of current OTUs reduces by one and a new iteration starts. The pseudo code of NJ algorithm is described as follows: /*Input: N (the number of OTUs) and Dij (the edit distance between pair OTUs i and j), Distance Matrix, D */ /* Output: NJ tree */ 1. While N > 2 do 1.1 For i=0 to N-1 do 1.2 For j=i+1 to N do 1.2.1 Compute Sij according to formula (l). EndFor EndFor 1.3 Search the minimum Sij to get neighbors i and j. 1.4 OTUs i and j are clustered as a new OTU X. 1.5 Compute DXk according to formula (2). 1.6 Compute branch lengths LiX and LjX according to formula (3). 1.7 Delete OTUs i and j, and add X to current OTU lists. 1.8 N=N-1. 1.9 End while 2 Cluster the last two OTUs. 3. TIME COMPLEXITY ANALYSIS OF NJ ALGORITHM According to the pseudo code of NJ algorithm, the time complexity of the outermost loop is O(N), lines 1.1 and 1.2 are O(N2 ), line 1.3 is O(N2 ), line 1.4 is O(1), line 1.5 is O(N), line 1.6 is O(N), line 1.7 is O(1) and line 2 is O(N). As for line 1.2.1, because formula (1) includes three parts, we will analyze the time complexity of which separately. The first part contributes the time O(N), the second is O(1) and the third is O(N2 ) because we should consider all the m and n but i and j. So the most expensive step is formula (1), which induces that the total time of NJ is O(N5 ) if we compute it directly. So the key point to optimize NJ algorithm is to optimize the computation of formula (1). For the three parts of formula (1), in fact there are many repeated computations of the same data in part 1 and 3. For example, in part 1, when computing S12, D13 is added; when computing S14, D13 is added too. In part 3, when computing S12, D34 is added; when computing S15, D34 is added too. In other words, there are close relations between the computations of Sij. In other words, NJ
  • 5. International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010 133 algorithm should compute the triangular matrix (Sij) iteratively to search the nearest neighbors. But it is fortunate that the computations of Sij only need the distance matrix (Dij), which motivates us to design fast parallel algorithm to avoid the repeated computations of Dij. 4. PARALLEL FAST ALGORITHM FOR CONSTRUCTING NJ TREE Our motivation is very simple, we try to reduce the number of computations in the inner most loop (1.2.1), which is the most expensive of the computations for the time complexity of the whole algorithm. Pre-processing outside the double loops of lines 1.1 to 1.2 of the pseudo code has been done, the results of which are used for the computations of Sij (line 1.2.1). 1 0 ( ) N i ij j A D j i − = = ≠∑ (4) 1 0 N i i total A − = = ∑ (5) Define a linear array A[N] and a buffer total the former is used to store the sum of every row of Dij and the later to store all the sum of Dij .Note that (Dij) is a symmetric square matrix but not a triangular matrix as (Sij). Then the first and third part of Sij sum1 and sum3 can be written respectively as ( ) ( )1 2 / 2 2 def i j ijsum A A D N= + − − (6) 3 ( 2( ) 2 ) /2( 2) def i j ijsum total A A D N= − + + − (7) and 1 1 3 2 ij ijS sum D sum= + + (8) 1 ( ) / 2( – 2) 2 i j ijtotal A A N D= − − + Compared with formula (1), this modified formula (8) is simpler and efficient. To compute Ai and total, the time complexity is O(N2 ) and O(N) respectively. And the time complexity of formula (8) is only O(1). Proof of equations (6), (7), and (8) 1 1 0 0 ( ) ( , ) N N i ij ik ij ik k k A D D k i D D k i j − − = = − = ≠ − = ≠∑ ∑ 1 1 0 0 ( ) ( , ) N N j ij jk ij jk k k A D D k j D D k i j − − = = − = ≠ − = ≠∑ ∑ (Ai – Dij) + (Aj – Dij) = Ai + Aj – 2Dij
  • 6. International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010 134 1 0 ( )( , ) N ik jk k D D k i j − = = + ≠∑ (2) For sum3, (Ai + Aj) denotes the sums of rows and cols of i and j respectively. (total – 2 (Ai + Aj )+2Dij) denotes the sums of the remaining elements of (Dij) except those elements related to i and j. 2Dij is added because Dij and Dji are deleted twice from rows and cols of i and j. Then 2( ) 2 ( , , ).i j ij mn m n total A A D D m n i j < − + + = ≠∑ Consider formula (8) again, total and N are constants for every Sij at each iteration, so they need not be computed to search the minimum Sij, and the constant factor 1/2 and 1/(N – 2) are ignored too, then we can get a more simple form of Sij (formula (9)) compared with formula (8). We can search minimum Sij using S′ij where S′ij =Dij(N – 2) – ( Ai+Aj) (9) Formula (9) does not include total, which implies that we need not compute the sum of Ai any more as formula (5) iteratively. This saves O(N) time computation. Consider the distance matrix (Dij), suppose that OTUs i* and j* are the neighbors to be joined in the tree, then the two rows and cols of i* and j* are deleted from Dij , and a new row and column (denoted by X) are added. Then DXk is computed according to formula (2). In fact, we need not compute Ai again as formula (4), because most values of the remaining Ai are unchanged between every two successive iteration step. Ax can be computed in O(N) time. For the remaining Ai' only Dii* and Dij* are deleted, and DiX is added. So Ai =Ai – Dit* -Dij* +DiX (10) Which saves O(N) time for each iteration. And the formulas for DiZ and DjZ in formula (3) are computed as DiZ =(Ai - Dij)/(N - 2) (11) DjZ = (Aj -Dij)/(N -2) Which saves O(N) time compared with formula (3). Computing Sij for all pair of protein/DNA sequences have been parallelized. As formula (9) for Sij is a simple one, fine grain parallelism is suitable. The elements of the upper triangular matrix (Sij) are mapped onto a one dimensional array for equal load distribution among the threads. The indexes i and j of the matrix (Sij) can be mapped onto the index r of the one dimensional array where r = (i – 1) (2n – i) / 2 + j – i
  • 7. International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010 135 Step 1.2 of our algorithm uses master-slave parallelization method to compute minimum Sij. In each iteration, master thread creates multiple slave threads. Each thread calculates minimum Sij independently. Then master thread waits till all slaves complete finding their respective local minimum. The number of Sijs’ computed by each thread (both master and slaves) are evenly distributed among them. Finally, master declares the global minimum Sij and continues computation of Dkx, Lix, Ljx, Ai and Ax. The master thread repeats the above steps till N–2 number of internal nodes are inserted. Now the new algorithm is described as follows: Compute Ai according to formula (4). 1 While N > 2 do 1.0 For i=1 to N – 1 do 1.1 For j=i+1 to N do r = (i – 1) (2n – i) / 2 + j – i /*map the indexes i and j of upper triangular matrix (Sij) to a linear array index r*/ EndFor; EndFor; /* Step 1.2: Find minimum Sij by parallel threads (both master and slave) */ 1.2 For i=1 to (N × (N – 1)) / (2p) do parallel /* p is the number of processes */ Compute Sij according to formula (9). Find local minimum Sij EndFor; 1.3 Search the global minimum Sij from all local minimum to get neighbors i* and j* by the master thread /* Step 2: Update Dkx, Lix, Ljx, Ai and Ax by master thread*/ 1.4 OTUs i* and j* are clustered as a new OTU X 1.5 Compute DXk according to formula (2). 1.6 Compute branch lengths LiX and LjX according to formulas (3, 11). 1.7 Delete OTUs i* and j*, and add X to current OTU lists. 1.8 Update the remaining Ai according to formula (10) and compute Ax. 1.9 N=N-1. EndWhile 2 Cluster the last two OTUs. The time complexity of the new algorithm is O(N3 ). The implementation of multithreaded fine grain parallel algorithm can combine with either OpenMP, Pthreads or thread libraries. We implemented the fine grain parallel portion using Pthread. In our parallel algorithm, a scheduling strategy called fixed-size chunking [18] where tasks of one fixed size are to be allocated to available cores. 5. EXPERIMENT RESULTS: We implement our algorithm in C programming language and utilize the Pthread library of Linux Operating System. The experiment was conducted in a computer system having 2.0 GHz Quad core Xeon CPU with 16 GB of primary memory and 12 MB cache memory. This computer runs Linux OS with no other applications installed. All measured times include the times that
  • 8. International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010 136 Figure 2. Elapsed time for SK sequential(SK), our single(1T), dual(2T), and quad(4T) threads implementation of Neighbor Joining algorithm as a function of number of sequences. were taken to read the input data from a file and write the solution into a file. Furthermore, all measured times were measured when there is no one using the system except this experiment. The distance matrix (Dij) is generated by three methods: (1) Global pairwise alignments by dynamic programming, which is slowest but the most accurate. (2) K-mer distance [10, 12], which is linear time complexity for pairwise distance estimation. (3) Random generation by computer pseudo number, which is just to provide a fast method to generate data. Experiments measured the elapsed times for SK and our proposed single (1T), dual (2T), and quad (4T) thread implementation of NJ algorithm as a function of the number of sequences as shown in the Table 1. Number of sequences varies from 300 to 900 in our experience. All measured times are in millisecond. Table 1 Elapsed times of our NJ phylogenetic trees construction algorithm running in single(1T) dual(2T), quad(4T) threads and SK algorithm implementation with respect to no. of sequences. Times are in millisecond. 300 400 500 600 700 800 900 SK 202 368 710 1190 1805 2590 3610 IT 130 230 390 620 930 1330 1840 2T 120 205 350 550 820 1150 1610 4T 110 170 330 510 770 1105 1535
  • 9. International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010 137 Figure 3. Speedup of our Neighbor Joining algorithms running in single(1T), dual(2T), and quad(4T) threads comparison to SK as a function of number of sequences. Figure 2 shows the elapsed times for the sequential SK and our threaded (single, dual and quad) Neighbor joining phylogenetic tree algorithm as a function of number of sequences. All those threaded algorithms successfully reduce the execution time. Figure 3 shows the speedup for the single, dual, and quad thread execution of the proposed algorithm as a function of number of sequences as compared to SK algorithm. It can be observed that higher speedup can be achieved with more number of sequences. This happens due to the overhead of thread creation, which is fixed for any number of sequences. Now, we focus on the speedup as a function of number of threads. In Figure 4, the speedup increases from single thread to dual threads and from dual threads to quad threads. However, speedup may fall if both the numbers of sequences and number of thread do not increase proportionately. This is because of the overhead incurred in creating threads. The speedup seems to be lower than steady state when executed using large number of threads. The proposed method fully utilizes the multithreading feature along with reducing redundant computation in comparison to sequential SK algorithm. This achieves better speedup especially when the number of sequences is large. This is compatible with the massive work for the tree construction and gains more throughput.
  • 10. International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010 138 Figure 4. Speedup of our Neighbor Joining results for different number of sequences in comparison to SK as a function of number of threads. 6. CONCLUSIONS In this paper, a fast parallel algorithm for constructing Neighbor – Joining phylogenetic trees has been developed. The new algorithm has O(N3 ) time complexity compared with O(N5 ) of SN, the same as SK. Further a fine grain parallelism is incorporated using multiple threads in the inner most loop. The experimental results show that running time of our algorithm using a single thread is from tens to hundreds times faster than SN and about two times faster than SK for large N. It is also verified that Speedup of our algorithm using dual threads is 2.2 times faster and that using quad threads is 2.3 times faster than SK algorithm on a desktop computer. The key point of our algorithm is to reduce the repeated computations and to parallelize the computation. We tried the above computation in the inner most loop. REFERENCES [1] C. Yang and S. Khuri, "PTC: An interactive tool for phylogenetic tree construction," in IEEE Proc. of the Computational Systems Bioinformatics, Stanford University, 2003,8, pp. 476-477. [2] L. R. Foulds and R. L. Graham, "The steiner problem in phylogeny is NP-complete," Advances Appl. Math. vol. 3, 1982, pp. 43-49. [3] W. Day, "Computational complexity of inferring phylogenies from dissimilarity matrices," Bull. Math. BioI., voI.49,no.4, 1987, pp.461-467. [4] N. Saitou, and M. Nei, "The n.eighbor-joining method: a new method for reconstructing phylogenetic trees," Mol. BioI. Evol., vol. 4, no. 4, 1987, pp. 406-425.
  • 11. International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010 139 [5] J. Studier and K. Keppler, "A note on the neighbor-joining algorithm of Saitou and Nei," Mol. BioI. Evol., vol. 5, no. 6, 1988, pp. 729-731. [6] N. Saitou and T. Imanishi, "Relative efficiencies of the Fitch-Margoliash, maximum-parsimony, Maximum likelihood, minimum-evolution, and neighbor-joining methods of phylogenetic tree construction in obtaining the correct tree," Mol. BioI. Evol., vol. 6, no. 5, 1989, pp. 514-525. [7] J.D. Thompson, D.G. Higgins, and T. J. Gibson, "CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice," Nucleic Acids Res., vol. 22, no. 22, 1994, pp. 4673-4680. [8] C. Notredame, D.G. Higgins, and J. Heringa, "T-Coffee: a novel method for fast and accurate multiple sequence alignment," J. Mol. BioI., vol. 302, 2000, pp. 205-217. [9] K. Katoh, K. Misawa, K. Kuma and T. Miyata, "MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform," Nucleic Acids Res., vol. 30, no. 14, 2002, pp.3059-3066. [10] R.C. Edgar, "MUSCLE: multiple sequence alignment with high accuracy and high throughput," Nucleic Acids Res., vol. 32, no. 5, Mar. 2004, pp. 1792-1797. [11] P.H.A. Sneath and R.R. Sokal, Numerical Taxonomy. Freeman, San Francisco, 1973. [12] R.C. Edgar, "Local homology recognition-, Jan. and distance measures in linear time using compressed amino acid alphabets," Nucleic Acids Res., vol. 32, no. 1, Jan. 2004, pp.380-385. [13] H. O. Andrzej Lingas and A. Ostlin. Efficient merging and construction of evolutionary trees. Journal of Algorithms, volume 41, 2001, pages 41–51. [14] M. Csuros. Fast recovery of evolutionary trees with thousands of nodes. In the proceeding of the International Conference on Research in Computational Molecular Biology, number 01, page 2001, 1997. [15] R. C. T. Lee. Computational biology. Department of Computer Science and Information Engineering, National Chi-Nan University, 2001. [16] L. A. Salter. Algorithms for phylogenetic tree reconstruction. In Proceeding of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, volume 2, 2000, pages 459–465. [17] K. Strimmer and A. von Haeseler. Quartet puzzling: A quartet maximum-likelihood method for reconstructing tree topologies. In Molecular Biology and Evolution, volume 13(7), 1996, pages 964–969. [18] T. Hagerup. Allocating independent tasks to parallel processors: an experimental strudy. Journal of Parallel and Distributed Computing, 1997, Vol. 47, pp. 185-197. Authors : Biswajit Sahoo received Bachelor Degree in Electronics & Communication Engineering from NIT Durgapur, India, M.Tech Degree in Computer Engineering from B.E College, Shibpur & pursuing Ph.D. in Computer Science under Utkal University, India. He is presently working as Associate Professor Department of Computer Science & Engineering, KIIT University, Bhubaneswar, India. He has published papers in the areas relating to Parallel Algorithms and Bio Informatics.
  • 12. International Journal of Distributed and Parallel Systems (IJDPS) Vol.1, No.2, November 2010 140 Ashutosh Behura received Bachelor degree in Computer Science and Engineering from KIIT University, Bhubaneswar, India. He is presently working as Research Associate in the Department of Computer Science & Engineering, KIIT University, Bhubaneswar, India. His area of interest is Parallel Algorithms and Bio Informatics. Sudarsan Padhy is a Professor of Mathematics at Utkal University, India. He obtained his Ph.D. degree in Mathematics from Utkal University in 1979 and Postdoctoral research at University of Freiburg, Germany during 1980-81. He has over fifty published research papers and five books to his credit extending over Fluid dynamics, Finite difference and Finite element method for solving partial differential equations, Operation research, Parallel algorithms, Computational finance and Computational biology.