(Encoded using Huffman Coding Method)
Marwa K. Al-Rikaby
University of Babylon/ College
of Information Technology
DNA
 One of the building blocks in the organisms bodies.
 Consists of four chemical bases:
 Adenine (A).
 Thymine (T).
 Cytosine (C).
 Guanine (G).
 DNA bases pair up with each other, A with T and C with G,
to form units called base pairs.
 DNA in humans contains around 3 billion bases and these
are similar in two persons for about 99% of the total bases.
 Goal: analyzing, saving space and time.
 The DNA sequences constructed from the alphabet {A, T,
C, G}, and those sequences have various repeats usually
approximate.
 Only lossless algorithms are valid.
 DNA compression model is preferred to be:
 Based on a biological knowledge.
 Give compression.
 Simple, few parameters.
 Can give per symbol information content.
 Efficient algorithm.
 Since DNA sequences only contain the four bases {a, c, g, t} they
can be stored using two bits per input symbol.
 The standard compression tools, such as gzip and bzip, usually
fail to achieve any compression since they use more than two bits
per symbol.
 When compressing 229354 bases (57338 bytes), we get:
 HEHCMVCG: 57338 bytes (without compression).
 gzip: 66741 bytes (negative compression).
 bzip2: 62169 bytes (negative compression).
 In the case of multiple genomes from the same species, associated with
‘resequencing’ technologies, the flat text file approach is clearly
wasteful since for the most part the sequences are identical.
 A simple approach is to store a reference sequence, and then for each
other sequence, encode only the differences (or ‘deltas’) with respect to
the original sequence.
 Consider the sequences AACGACTAGTAATTTG and
CACGTCTAGTAATGTG which are identical, except for a substitution in
position 1 (A→C), 5 (A→T) and 14 (T→G). Each SNP can be encoded
by a pair (i, X), where i is an integer encoding the position and X
represents the value of the substitution relative to the reference.
Although the basic idea is easy to understand, and not new, a precise
implementation requires addressing a number of important technical
issues:
 One can use local relative addresses, i.e. intervals, rather than absolute
addresses. Using intervals, the above example ‘1C5T14G’becomes ‘0C4T9G’.
With intervals the dynamic range of the integers to be encoded may be
considerably smaller than with absolute addresses. The relatively modest price
to pay is that intervals must be added to recover absolute coordinates.
 If the positions at which variations occur in the population are fixed and form a
relatively small subset of all possible positions, then additional savings may
result by focusing only on those positions.
 The choice of the reference sequence.
 All applications of the basic ideas hinge on a fundamental technical
problem: how to encode integers, representing for instance
absolute or relative genomic addresses or read lengths, into
binary strings?
 we are interested in binary encoding schemes for sequences of
integers that can be parsed automatically and that, consistently
with information theory, are entropy efficient, in the sense that
fewer bits are used to encode more frequent events.
Common components of most of DNA compression
algorithms:
 Finding the candidate repeat segments.
 Considering approximate repeats.
 Selecting the best subset of compatible repeats.
 Encoding of the repeat segments.
 Encoding of the non-repeat segments.
Suppose we have the following DNA sequence:
TGATAGGTGATAGATATGATTGATAGATGATAGAAGATTG
ATAGATGATAGGGATATCACGTAGTCCCTAGCTCTTGGCG
CTGGATGGGGCGGACGGTAAGGGAAATCGACCGTTGATA
GTCCAAATTCGGTCGTATGATAGAAATTTCGAATGGAAAT
TCTGATACATAGGTGATAGTAGATGTAAGATGATAGATGAT
AGATAGATAGATGATAGACAGATTGATAGATGATAGAGAG
A
1. Finding the candidate repeat segments:
Let “TGATAG” be a candidate segment, so we’ll find its repetitions
in the example.
TGATAGGTGATAGATATGATTGATAGATGATAGAAGATTGATAGATGAT
AGGGATATCACGTAGTCCCTAGCTCTTGGCGCTGGATGGGGCGGACG
GTAAGGGAAATCGACCGTTGATAGTCCAAATTCGGTCGTATGATAGAA
ATTTCGAATGGAAATTCTGATACATAGGTGATAGTAGATGTAAGATGAT
AGATGATAGATAGATAGATGATAGACAGATTGATAGATGATAGAGAGA
1. Finding the candidate repeat segments:
Let “TGATAG” be a candidate segment, so we’ll find its repetitions
in the example.
TGATAGGTGATAGATATGATTGATAGATGATAGAAGATTGATAGATGAT
AGGGATATCACGTAGTCCCTAGCTCTTGGCGCTGGATGGGGCGGACG
GTAAGGGAAATCGACCGTTGATAGTCCAAATTCGGTCGTATGATAGAA
ATTTCGAATGGAAATTCTGATACATAGGTGATAGTAGATGTAAGATGAT
AGATGATAGATAGATAGATGATAGACAGATTGATAGATGATAGAGAGA
1. Finding the candidate repeat segments:
Let “TGATAG” be a candidate segment, so we’ll find its repetitions
in the example.
TGATAGGTGATAGATATGATTGATAGATGATAGAAGATTGATAGATGAT
AGGGATATCACGTAGTCCCTAGCTCTTGGCGCTGGATGGGGCGGACG
GTAAGGGAAATCGACCGTTGATAGTCCAAATTCGGTCGTATGATAGAA
ATTTCGAATGGAAATTCTGATACATAGGTGATAGTAGATGTAAGATGAT
AGATGATAGATAGATAGATGATAGACAGATTGATAGATGATAGAGAGA
1. Finding the candidate repeat segments:
Let “TGATAG” be a candidate segment, so we’ll find its repetitions
in the example.
TGATAGGTGATAGATATGATTGATAGATGATAGAAGATTGATAGATGAT
AGGGATATCACGTAGTCCCTAGCTCTTGGCGCTGGATGGGGCGGACG
GTAAGGGAAATCGACCGTTGATAGTCCAAATTCGGTCGTATGATAGAA
ATTTCGAATGGAAATTCTGATACATAGGTGATAGTAGATGTAAGATGAT
AGATGATAGATAGATAGATGATAGACAGATTGATAGATGATAGAGAGA
1. Finding the candidate repeat segments:
Let “TGATAG” be a candidate segment, so we’ll find its repetitions
in the example.
TGATAGGTGATAGATATGATTGATAGATGATAGAAGATTGATAGATGAT
AGGGATATCACGTAGTCCCTAGCTCTTGGCGCTGGATGGGGCGGACG
GTAAGGGAAATCGACCGTTGATAGTCCAAATTCGGTCGTATGATAGAA
ATTTCGAATGGAAATTCTGATACATAGGTGATAGTAGATGTAAGATGAT
AGATGATAGATAGATAGATGATAGACAGATTGATAGATGATAGAGAGA
1. Finding the candidate repeat segments:
Let “TGATAG” be a candidate segment, so we’ll find its repetitions
in the example.
TGATAGGTGATAGATATGATTGATAGATGATAGAAGATTGATAGATGAT
AGGGATATCACGTAGTCCCTAGCTCTTGGCGCTGGATGGGGCGGACG
GTAAGGGAAATCGACCGTTGATAGTCCAAATTCGGTCGTATGATAGAA
ATTTCGAATGGAAATTCTGATACATAGGTGATAGTAGATGTAAGATGAT
AGATGATAGATAGATAGATGATAGACAGATTGATAGATGATAGAGAGA
1. Finding the candidate repeat segments:
Let “TGATAG” be a candidate segment, so we’ll find its repetitions
in the example.
TGATAGGTGATAGATATGATTGATAGATGATAGAAGATTGATAGATGAT
AGGGATATCACGTAGTCCCTAGCTCTTGGCGCTGGATGGGGCGGACG
GTAAGGGAAATCGACCGTTGATAGTCCAAATTCGGTCGTATGATAGAA
ATTTCGAATGGAAATTCTGATACATAGGTGATAGTAGATGTAAGATGAT
AGATGATAGATAGATAGATGATAGACAGATTGATAGATGATAGAGAGA
1. Finding the candidate repeat segments:
Let “TGATAG” be a candidate segment, so we’ll find its repetitions
in the example.
TGATAGGTGATAGATATGATTGATAGATGATAGAAGATTGATAGATGAT
AGGGATATCACGTAGTCCCTAGCTCTTGGCGCTGGATGGGGCGGACG
GTAAGGGAAATCGACCGTTGATAGTCCAAATTCGGTCGTATGATAGAA
ATTTCGAATGGAAATTCTGATACATAGGTGATAGTAGATGTAAGATGAT
AGATGATAGATAGATAGATGATAGACAGATTGATAGATGATAGAGAGA
1. Finding the candidate repeat segments:
Let “TGATAG” be a candidate segment, so we’ll find its repetitions
in the example.
TGATAGGTGATAGATATGATTGATAGATGATAGAAGATTGATAGATGAT
AGGGATATCACGTAGTCCCTAGCTCTTGGCGCTGGATGGGGCGGACG
GTAAGGGAAATCGACCGTTGATAGTCCAAATTCGGTCGTATGATAGAA
ATTTCGAATGGAAATTCTGATACATAGGTGATAGTAGATGTAAGATGAT
AGATGATAGATAGATAGATGATAGACAGATTGATAGATGATAGAGAGA
1. Finding the candidate repeat segments:
Let “TGATAG” be a candidate segment, so we’ll find its repetitions
in the example.
TGATAGGTGATAGATATGATTGATAGATGATAGAAGATTGATAGATGAT
AGGGATATCACGTAGTCCCTAGCTCTTGGCGCTGGATGGGGCGGACG
GTAAGGGAAATCGACCGTTGATAGTCCAAATTCGGTCGTATGATAGAA
ATTTCGAATGGAAATTCTGATACATAGGTGATAGTAGATGTAAGATGAT
AGATGATAGATAGATAGATGATAGACAGATTGATAGATGATAGAGAGA
 The total number of “TGATAG” occurrences is 14.
 All segments repetitions should be indicated in this way.
 The counted numbers are kept for using in the encoding.
2. Considering approximate repeats:
Scanning the sequence to find out any similarity between
the segments, i.e. segments can be identical after
applying any operation from the four basic operations:
 Insertion: “AAATTCG”==“AAATTCTG” after Ins(T,6).
 Deletion: “AAATTCG”==“AAATTG” after Del(5,1).
 Replacement: “AAATTCG”==“AATTTCG” after Rep(2,T).
 Reverse: “AAATTCG”==“GCTTAAA” after Rev().
2. Considering approximate repeats:
let “ATATGA” be a reference segment, then “ATATCA” is identical to it if
we replace “G” by “C”
TGATAGGTGATAGATATGATTGATAGATGATAGAAGATTGATAGATGAT
AGGGATATCACGTAGTCCCTAGCTCTTGGCGCTGGATGGGGCGGACG
GTAAGGGAAATCGACCGTTGATAGTCCAAATTCGGTCGTATGATAGAA
ATTTCGAATGGAAATTCTGATACATAGGTGATAGTAGATGTAAGATGAT
AGATGATAGATAGATAGATGATAGACAGATTGATAGATGATAGAGAGA
2. Considering approximate repeats:
“ATAGA” is identical to “ATATGA” when deleting “T” at position 3.
TGATAGGTGATAGATATGATTGATAGATGATAGAAGATTGATAGATGAT
AGGGATATCACGTAGTCCCTAGCTCTTGGCGCTGGATGGGGCGGACG
GTAAGGGAAATCGACCGTTGATAGTCCAAATTCGGTCGTATGATAGAA
ATTTCGAATGGAAATTCTGATACATAGGTGATAGTAGATGTAAGATGAT
AGATGATAGATAGATAGATGATAGACAGATTGATAGATGATAGAGAGA
2. Considering approximate repeats:
“ATATGA” is identical to “ATAGA” when deleting “T” at position 3.
“GGCGC” is identical to “GGCGG” when replacing “C” by “G” at position 4.
TGATAGGTGATAGATATGATTGATAGATGATAGAAGATTGATAGATGAT
AGGGATATCACGTAGTCCCTAGCTCTTGGCGCTGGATGGGGCGGACG
GTAAGGGAAATCGACCGTTGATAGTCCAAATTCGGTCGTATGATAGAA
ATTTCGAATGGAAATTCTGATACATAGGTGATAGTAGATGTAAGATGAT
AGATGATAGATAGATAGATGATAGACAGATTGATAGATGATAGAGAGA
2. Considering approximate repeats:
“AATGG” is identical to “GGTAA” when reversing it.
TGATAGGTGATAGATATGATTGATAGATGATAGAAGATTGATAGATGAT
AGGGATATCACGTAGTCCCTAGCTCTTGGCGCTGGATGGGGCGGACG
GTAAGGGAAATCGACCGTTGATAGTCCAAATTCGGTCGTATGATAGAA
ATTTCGAATGGAAATTCTGATACATAGGTGATAGTAGATGTAAGATGAT
AGATGATAGATAGATAGATGATAGACAGATTGATAGATGATAGAGAGA
3. Selecting the best subset of compatible repeats:
 The choosing of the reference segment is a major and a very sensitive
process since the design of the reference sequence impacts not only
the variants to be recorded, but also the intervals, and therefore it
must also take into consideration any constraints a particular
implementation may place on the intervals and their encodings.
 In our example, The segments that we have detected should have
integer numbers pointing to its indexes in the reference table.
Segment Index
A 0
T 1
C 2
G 3
TGATAG 4
ATATGA 5
AAATTCG 6
GGTAA 7
GGCGC 8
RepC 9
Del 10
InsT 11
Rev 12
RepG 13
RepT 14
The reference table contains:
• The four basic symbols {A, T, G,
C}.
• The candidates segments.
• The basic operations, each one
with the available parameters
applied on the sequence.
4. Encoding of the Repeat segment:
initially the repetitions of each candidate segment must be counted
in the same way shown in step 1. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4
ATATGA 5
AAATTCG 6
GGTAA 7
GGCGC 8
RepC 9
Del 10
InsT 11
Rev 12
RepG 13
RepT 14
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
4. Encoding of the Repeat segment:
initially the repetitions of each candidate segment must be counted
in the same way shown in step 1. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4
ATATGA 5
AAATTCG 6
GGTAA 7
GGCGC 8
RepC 9
Del 10
InsT 11
Rev 12
RepG 13
RepT 14
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
4. Encoding of the Repeat segment:
initially the repetitions of each candidate segment must be counted
in the same way shown in step 1. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4 14
ATATGA 5
AAATTCG 6
GGTAA 7
GGCGC 8
RepC 9
Del 10
InsT 11
Rev 12
RepG 13
RepT 14
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
4. Encoding of the Repeat segment:
initially the repetitions of each candidate segment must be counted
in the same way shown in step 1. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4 14
ATATGA 5 1
AAATTCG 6
GGTAA 7
GGCGC 8
RepC 9
Del 10
InsT 11
Rev 12
RepG 13
RepT 14
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
4. Encoding of the Repeat segment:
initially the repetitions of each candidate segment must be counted
in the same way shown in step 1. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4 14
ATATGA 5 1
AAATTCG 6 1
GGTAA 7
GGCGC 8
RepC 9
Del 10
InsT 11
Rev 12
RepG 13
RepT 14
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
4. Encoding of the Repeat segment:
initially the repetitions of each candidate segment must be counted
in the same way shown in step 1. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4 14
ATATGA 5 1
AAATTCG 6 1
GGTAA 7 1
GGCGC 8
RepC 9
Del 10
InsT 11
Rev 12
RepG 13
RepT 14
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
4. Encoding of the Repeat segment:
initially the repetitions of each candidate segment must be counted
in the same way shown in step 1. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4 14
ATATGA 5 1
AAATTCG 6 1
GGTAA 7 1
GGCGC 8 1
RepC 9
Del 10
InsT 11
Rev 12
RepG 13
RepT 14
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
4. Encoding of the Repeat segment:
initially the repetitions of each candidate segment must be counted
in the same way shown in step 1. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4 14
ATATGA 5 1
AAATTCG 6 1
GGTAA 7 1
GGCGC 8 1
RepC 9
Del 10
InsT 11
Rev 12
RepG 13
RepT 14
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
5. Encoding of the Non-Repeat segment:
The approximate segment must be processed in the same way shown
in step 2. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4 14
ATATGA 5 1
AAATTCG 6 1
GGTAA 7 1
GGCGC 8 1
RepC 9
Del 10
InsT 11
Rev 12
RepG 13
RepT 14
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
5. Encoding of the Non-Repeat segment:
The approximate segment must be processed in the same way shown
in step 2. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4 14
ATATGA 5 1 +1
AAATTCG 6 1
GGTAA 7 1
GGCGC 8 1
RepC 9
Del 10
InsT 11
Rev 12
RepG 13 1
RepT 14
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
5. Encoding of the Non-Repeat segment:
The approximate segment must be processed in the same way shown
in step 2. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4 14
ATATGA 5 1 +1 +1
AAATTCG 6 1
GGTAA 7 1
GGCGC 8 1
RepC 9
Del 10 1
InsT 11
Rev 12
RepG 13 1
RepT 14
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
5. Encoding of the Non-Repeat segment:
The approximate segment must be processed in the same way shown
in step 2. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4 14
ATATGA 5 3
AAATTCG 6 1 +1
GGTAA 7 1
GGCGC 8 1
RepC 9
Del 10 1 +1
InsT 11
Rev 12
RepG 13 1
RepT 14
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
5. Encoding of the Non-Repeat segment:
The approximate segment must be processed in the same way shown
in step 2. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4 14
ATATGA 5 3
AAATTCG 6 1 +1 +1
GGTAA 7 1
GGCGC 8 1
RepC 9
Del 10 2
InsT 11 1
Rev 12
RepG 13 1
RepT 14
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
5. Encoding of the Non-Repeat segment:
The approximate segment must be processed in the same way shown
in step 2. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4 14
ATATGA 5 3
AAATTCG 6 1 +1 +1 +1
GGTAA 7 1
GGCGC 8 1
RepC 9
Del 10 2
InsT 11 1 +1
Rev 12
RepG 13 1
RepT 14
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
5. Encoding of the Non-Repeat segment:
The approximate segment must be processed in the same way shown
in step 2. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4 14
ATATGA 5 3
AAATTCG 6 4
GGTAA 7 1 +1
GGCGC 8 1
RepC 9
Del 10 2
InsT 11 2
Rev 12 1
RepG 13 1
RepT 14
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
5. Encoding of the Non-Repeat segment:
The approximate segment must be processed in the same way shown
in step 2. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4 14
ATATGA 5 3
AAATTCG 6 4
GGTAA 7 1 +1 +1
GGCGC 8 1
RepC 9
Del 10 2
InsT 11 2
Rev 12 1
RepG 13 1
RepT 14 1
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
5. Encoding of the Non-Repeat segment:
The approximate segment must be processed in the same way shown
in step 2. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4 14
ATATGA 5 3
AAATTCG 6 4
GGTAA 7 3
GGCGC 8 1 +1
RepC 9 1
Del 10 2
InsT 11 2
Rev 12 1
RepG 13 1
RepT 14 1
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
5. Encoding of the Non-Repeat segment:
The approximate segment must be processed in the same way shown
in step 2. Segment Index repetitions
A 0
T 1
C 2
G 3
TGATAG 4 14
ATATGA 5 3
AAATTCG 6 4
GGTAA 7 2
GGCGC 8 2
RepC 9 1
Del 10 2
InsT 11 2
Rev 12 1
RepG 13 1
RepT 14 1
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
5. Encoding of the Non-Repeat segment:
The approximate segment must be processed in the same way shown
in step 2. Segment Index repetitions
A 0
T 1
C 2
G 3 25
TGATAG 4 14
ATATGA 5 3
AAATTCG 6 4
GGTAA 7 2
GGCGC 8 2
RepC 9 1
Del 10 2
InsT 11 2
Rev 12 1
RepG 13 1
RepT 14 1
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
5. Encoding of the Non-Repeat segment:
The approximate segment must be processed in the same way shown
in step 2. Segment Index repetitions
A 0
T 1 19
C 2
G 3 25
TGATAG 4 14
ATATGA 5 3
AAATTCG 6 4
GGTAA 7 2
GGCGC 8 2
RepC 9 1
Del 10 2
InsT 11 2
Rev 12 1
RepG 13 1
RepT 14 1
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
5. Encoding of the Non-Repeat segment:
The approximate segment must be processed in the same way shown
in step 2. Segment Index repetitions
A 0 28
T 1 19
C 2
G 3 25
TGATAG 4 14
ATATGA 5 3
AAATTCG 6 4
GGTAA 7 2
GGCGC 8 2
RepC 9 1
Del 10 2
InsT 11 2
Rev 12 1
RepG 13 1
RepT 14 1
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
5. Encoding of the Non-Repeat segment:
The approximate segment must be processed in the same way shown
in step 2. Segment Index repetitions
A 0 28
T 1 19
C 2 14
G 3 25
TGATAG 4 14
ATATGA 5 3
AAATTCG 6 4
GGTAA 7 2
GGCGC 8 2
RepC 9 1
Del 10 2
InsT 11 2
Rev 12 1
RepG 13 1
RepT 14 1
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
5. Encoding of the Non-Repeat segment:
The approximate segment must be processed in the same way shown
in step 2. Segment Index repetitions
A 0 28
T 1 19
C 2 14
G 3 25
TGATAG 4 14
ATATGA 5 3
AAATTCG 6 4
GGTAA 7 2
GGCGC 8 2
RepC 9 1
Del 10 2
InsT 11 2
Rev 12 1
RepG 13 1
RepT 14 1
TGATAGGTGATAGATATGATTGATAGAT
GATAGAAGATTGATAGATGATAGGGAT
ATCACGTAGTCCCTAGCTCTTGGCGCT
GGATGGGGCGGACGGTAAGGGAAATC
GACCGTTGATAGTCCAAATTCGGTCGT
ATGATAGAAATTTCGAATGGAAATTCT
GATACATAGGTGATAGTAGATGTAAGA
TGATAGATGATAGATAGATAGATGATAG
ACAGATTGATAGATGATAGAGAGA
 First, find each segment
probability:
Segment Index repetitions probability
A 0 28 28/119
T 1 19 19/119
C 2 14 14/119
G 3 25 25/119
TGATAG 4 14 14/119
ATATGA 5 3 3/119
AAATTCG 6 4 4/119
GGTAA 7 2 2/119
GGCGC 8 2 2/119
RepC 9 1 1/119
Del 10 2 2/119
InsT 11 2 2/119
Rev 12 1 1/119
RepG 13 1 1/119
RepT 14 1 1/119
No. of segments = 119
 Arrange the segments in
non-decreasing order
according to its
probability.
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
 Build Huffman Coding Tree
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
66/119
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
66/119
119/119
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
66/119
119/119
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
66/119
119/119
Next, beginning from the
root and backing to the
leaves, give each branch with
the small value the value (0)
and the large the value (1).
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
66/119
119/119
Next, beginning from the
root and backing to the
leaves, give each branch with
the small value the value (0)
and the large the value (1).
1
0
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
66/119
119/119
Next, beginning from the
root and backing to the
leaves, give each branch with
the small value the value (0)
and the large the value (1).
1
0
1
0
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
66/119
119/119
Next, beginning from the
root and backing to the
leaves, give each branch with
the small value the value (0)
and the large the value (1).
1
0
1
0
1
0
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
66/119
119/119
Next, beginning from the
root and backing to the
leaves, give each branch with
the small value the value (0)
and the large the value (1).
1
0
1
0
1
0
1
0
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
66/119
119/119
Next, beginning from the
root and backing to the
leaves, give each branch with
the small value the value (0)
and the large the value (1).
1
0
1
0
1
0
1
0
1
0
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
66/119
119/119
Next, beginning from the
root and backing to the
leaves, give each branch with
the small value the value (0)
and the large the value (1).
1
0
1
0
1
0
1
0
1
0
1
0
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
66/119
119/119
Next, beginning from the
root and backing to the
leaves, give each branch with
the small value the value (0)
and the large the value (1).
1
0
1
0
1
0
1
0
1
0
1
0
0
1
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
66/119
119/119
Next, beginning from the
root and backing to the
leaves, give each branch with
the small value the value (0)
and the large the value (1).
1
0
1
0
1
0
1
0
1
0
1
0
0
1
1
0
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
66/119
119/119
Next, beginning from the
root and backing to the
leaves, give each branch with
the small value the value (0)
and the large the value (1).
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
66/119
119/119
Finally, encode the segments
via reading its code from the
root to its leaf.
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
66/119
119/119
Code(0)=code (A)=10
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
66/119
119/119
Code(3)=code(G)=00
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
Code(0)=code (A)=10
1
14
13
12
9
11
10
8
7
5
6
4
2
1
3
0
1/119
1/119
1/119
1/119
2/119
2/119
2/119
2/119
3/119
4/119
14/119
14/119
19/119
25/119
28/119
2/119
2/119
4/119
4/119
4/119
7/119
8/119
11/119
19/119
28/119
38/119
53/119
66/119
119/119
Code(9)=code(RepC)=1100111
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
Code(3)=code(G)=00
Code(0)=code (A)=10
Segment Index Repetitions Probability Code
A 0 28 28/119 1 0
T 1 19 19/119 1 1 1
C 2 14 14/119 0 1 1
G 3 25 25/119 0 0
TGATAG 4 14 14/119 0 1 0
ATATGA 5 3 3/119 1 1 0 1 1 0
AAATTCG 6 4 4/119 1 1 0 1 1 1
GGTAA 7 2 2/119 1 1 0 1 0 1
GGCGC 8 2 2/119 1 1 0 1 0 0
RepC 9 1 1/119 1 1 0 0 1 1 1
Del 10 2 2/119 1 1 0 0 0 1
InsT 11 2 2/119 1 1 0 0 0 0
Rev 12 1 1/119 1 1 0 0 1 1 0
RepG 13 1 1/119 1 1 0 0 1 0 1
RepT 14 1 1/119 1 1 0 0 1 0 0
The final reference
table is:
Keep in mind that only
the segments and the
codes are important for
the decoder.
The previous coding satisfy both prefix property and the
information theory in that :
• There is no code given for a segment is a prefix in an
other segment code.
•The shortest codes given to segments that are more
frequent while long ones assigned to those which are
less frequent.
DNA Compression (Encoded using Huffman Encoding Method)

More Related Content

PPTX
Feature enginnering and selection
PPT
Chapter 9. Classification Advanced Methods.ppt
PPT
3. mining frequent patterns
PDF
Dining Philosopher's Problem
PDF
Case study-the next gen pos
PPT
5.1 mining data streams
PDF
Deadlock
PDF
Uncertain knowledge and reasoning
Feature enginnering and selection
Chapter 9. Classification Advanced Methods.ppt
3. mining frequent patterns
Dining Philosopher's Problem
Case study-the next gen pos
5.1 mining data streams
Deadlock
Uncertain knowledge and reasoning

What's hot (20)

PPTX
Example of The FP tree algorithm. Explained each and every steps
PPT
Chapter 3. Data Preprocessing.ppt
PPTX
CS 402 DATAMINING AND WAREHOUSING -PROBLEMS
PPT
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
PPTX
Lect5 principal component analysis
PPT
Backtracking
PDF
Binary Search - Design & Analysis of Algorithms
PPTX
Data Mining: Text and web mining
PDF
Shared-Memory Multiprocessors
PPTX
OS - Unit 3 Deadlock (Bankers Algorithm).pptx
PPTX
PPTX
Chap 2 classification of parralel architecture and introduction to parllel p...
PPT
Cure, Clustering Algorithm
PPTX
Dining philosopher
PPTX
Job sequencing in Data Strcture
PPTX
knapsack problem
PPTX
Naive bayesian classification
PDF
Sequential Pattern Mining and GSP
PPTX
Semantic nets in artificial intelligence
PDF
What is simultaneous multithreading
Example of The FP tree algorithm. Explained each and every steps
Chapter 3. Data Preprocessing.ppt
CS 402 DATAMINING AND WAREHOUSING -PROBLEMS
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Lect5 principal component analysis
Backtracking
Binary Search - Design & Analysis of Algorithms
Data Mining: Text and web mining
Shared-Memory Multiprocessors
OS - Unit 3 Deadlock (Bankers Algorithm).pptx
Chap 2 classification of parralel architecture and introduction to parllel p...
Cure, Clustering Algorithm
Dining philosopher
Job sequencing in Data Strcture
knapsack problem
Naive bayesian classification
Sequential Pattern Mining and GSP
Semantic nets in artificial intelligence
What is simultaneous multithreading
Ad

Similar to DNA Compression (Encoded using Huffman Encoding Method) (20)

PDF
50320130403003 2
PDF
A Biological Sequence Compression Based on cross chromosomal similarities usi...
PDF
SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM
PDF
A new revisited compression technique through innovative partition group binary
PDF
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
PDF
Comparative analysis of dynamic programming
PDF
Comparative analysis of dynamic programming algorithms to find similarity in ...
PPTX
20101209 dnaseq pevzner
PDF
Dna data compression algorithms based on redundancy
PPTX
Biological sequences analysis
PPTX
Bioinformatics life sciences_v2015
PDF
A Comparison of Computation Techniques for DNA Sequence Comparison
PDF
AJMS_476_23.pdf
PPTX
Next generation sequencing
PDF
BITS: Basics of sequence analysis
PDF
Paired-end alignments in sequence graphs
DOCX
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
PDF
PCB_Lect02_Pairwise_allign (1).pdf
PDF
Ch09 combinatorialpatternmatching
PDF
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
50320130403003 2
A Biological Sequence Compression Based on cross chromosomal similarities usi...
SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM
A new revisited compression technique through innovative partition group binary
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...
Comparative analysis of dynamic programming
Comparative analysis of dynamic programming algorithms to find similarity in ...
20101209 dnaseq pevzner
Dna data compression algorithms based on redundancy
Biological sequences analysis
Bioinformatics life sciences_v2015
A Comparison of Computation Techniques for DNA Sequence Comparison
AJMS_476_23.pdf
Next generation sequencing
BITS: Basics of sequence analysis
Paired-end alignments in sequence graphs
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
PCB_Lect02_Pairwise_allign (1).pdf
Ch09 combinatorialpatternmatching
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
Ad

Recently uploaded (20)

PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PDF
Social preventive and pharmacy. Pdf
PPTX
gene cloning powerpoint for general biology 2
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPTX
limit test definition and all limit tests
PPTX
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
PDF
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
PPTX
Seminar Hypertension and Kidney diseases.pptx
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PPT
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PPT
LEC Synthetic Biology and its application.ppt
PPTX
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PPTX
Understanding the Circulatory System……..
PPTX
Substance Disorders- part different drugs change body
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
Social preventive and pharmacy. Pdf
gene cloning powerpoint for general biology 2
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
limit test definition and all limit tests
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
BODY FLUIDS AND CIRCULATION class 11 .pptx
Seminar Hypertension and Kidney diseases.pptx
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
LEC Synthetic Biology and its application.ppt
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
Understanding the Circulatory System……..
Substance Disorders- part different drugs change body
Hypertension_Training_materials_English_2024[1] (1).pptx

DNA Compression (Encoded using Huffman Encoding Method)