SlideShare a Scribd company logo
GUIDE : MS. ANAGHA CHAUDHARI
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
A sequence : < (ef) (ab) (df) c b >
A sequence database
SID        sequence             An element may contain a set of items.
                                Items within an element are unordered
10     <a(abc)(ac)d(cf)>
                                and we list them alphabetically.
20      <(ad)c(bc)(ae)>
30      <(ef)(ab)(df)cb>        <a(bc)df> is a subsequence of
40        <eg(af)cbc>           <a(abc)(ac)d(cf)>

  Given support threshold min_sup =2, <(ab)c> is a sequential
  pattern                                                                6
CHALLENGES ON SEQUENTIAL
PATTERN MINING
 A huge number of possible sequential patterns are hidden in
  databases

 A mining algorithm should
    find the complete set of patterns, when possible, satisfying the
     minimum support (frequency) threshold
    be highly efficient, scalable, involving only a small number of
     database scans
    be able to incorporate various kinds of user-specific
     constraints

                                                        7
Sequential pattern mining
Sequential pattern mining
The Apriori Algorithm—An Example
                      Supmin = 2      Itemset       sup
                                                                     Itemset     sup
Database TDB                             {A}         2
 Tid        Items
                                                           L1          {A}         2
                               C1        {B}         3
                                                                       {B}         3
 10         A, C, D                      {C}         3
                          1st scan                                     {C}         3
 20         B, C, E                      {D}         1
                                                                       {E}         3
 30     A, B, C, E                       {E}         3
 40          B, E
                              C2     Itemset    sup               C2         Itemset
                                      {A, B}     1
 L2    Itemset        sup                                 2nd scan            {A, B}
                                      {A, C}     2
        {A, C}         2                                                      {A, C}
                                      {A, E}     1
        {B, C}         2
                                      {B, C}     2                            {A, E}
        {B, E}         3
                                      {B, E}     3                            {B, C}
        {C, E}         2
                                      {C, E}     2                            {B, E}
                                                                              {C, E}

              Itemset
                              3rd scan         L3   Itemset     sup
       C3
              {B, C, E}                             {B, C, E}    2
                                                                                       10
The Apriori Algorithm [Pseudo-Code]

Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk != ; k++) do begin
  Ck+1 = candidates generated from Lk;
  for each transaction t in database do
    increment the count of all candidates in Ck+1 that are
     contained in t
  Lk+1 = candidates in Ck+1 with min_support
  end
return k Lk;
                                                             11
APRIORI ADV/DISADV

 Advantages:
   Uses large itemset property.
   Easily parallelized
   Easy to implement.

 Disadvantages:
   Assumes transaction database is memory resident.
   Requires up to m database scans.
   J. Han, J. Pei, and Y. Yin 2000
   Depth-first search
   Avoid explicit candidate generation
   Adopt divide-and-conquer strategy
   Two step approach
    Step1:Build a compact data
          structure called FP tree
    Step2:Extract frequent itemsets
           from FP tree.
Step 1: FP-Tree Construction
 FP-Tree is constructed using 2 passes over the data-set:

  Pass 1:
    Scan data and find support for each item.
    Discard infrequent items.
    Sort frequent items in decreasing order based on
      their support.
Pass 2:

Nodes correspond to items and have a counter

1.     FP-Growth reads 1 transaction at a time and maps it to a path

2.     Fixed order is used, so paths can overlap when transactions share items (when
       they have the same prfix ).
     – In this case, counters are incremented

3.      Pointers are maintained between nodes containing the same item, creating singly
       linked lists (dotted lines)
     – The more paths that overlap, the higher the compression. FP-tree may fit in
       memory.

4.     Frequent itemsets extracted from the FP-Tree.
 Start from each frequent length-1 pattern (as an initial suffix
  pattern)
 construct its conditional pattern base (a ―subdatabase,‖which
  consists of the set of prefix paths in the FP-tree co-occurring
  with the suffix pattern)
 Construct its (conditional) FP-tree, and perform mining
  recursively on such a tree.
 The pattern growth is achieved by the concatenation of the
  suffix pattern with the frequent patterns generated from a
  conditional FP-tree.
Table : Table after
                             first scan of database
Table : Transactional data
Fig . FP – Tree Construction
EXAMPLE CONT




Table:Mining FP Tree by creating conditional (sub)-pattern bases
EXAMPLE CONT




Fig.The conditional FP-tree associated with the conditiona node I3
FP-FROWTH ADV/DISADV

Advantages of FP-Growth
  • only 2 passes over data-set
  • ―compresses‖ data-set
  • no candidate generation
  • much faster than Apriori

Disadvantages of FP-Growth
  • FP-Tree may not fit in memory!!
  • FP-Tree is expensive to build
APPLICATIONS



Customer shopping sequences:
   First buy computer, then CD-ROM, and then digital camera, within 3
    months.

Medical treatments, natural disasters (e.g., earthquakes), science
 & eng. processes, stocks and markets, etc.
Telephone calling patterns, Weblog click streams
DNA sequences and gene structures


                                                                  22
Sequential pattern mining
Sequential pattern mining
Sequential pattern mining
THANK YOU
Sequential pattern mining

More Related Content

PPTX
Data Mining: Graph mining and social network analysis
PDF
Sequential Pattern Mining and GSP
PPSX
Frequent itemset mining methods
PPTX
Data mining techniques unit III
PPTX
SPADE -
PPTX
Density based methods
PDF
Information retrieval-systems notes
PPTX
Machine Learning PPT BY RAVINDRA SINGH KUSHWAHA B.TECH(IT) CHAUDHARY CHARAN S...
Data Mining: Graph mining and social network analysis
Sequential Pattern Mining and GSP
Frequent itemset mining methods
Data mining techniques unit III
SPADE -
Density based methods
Information retrieval-systems notes
Machine Learning PPT BY RAVINDRA SINGH KUSHWAHA B.TECH(IT) CHAUDHARY CHARAN S...

What's hot (20)

PPTX
Multidimensional data models
PPTX
Cluster Analysis
PPT
5.3 mining sequential patterns
PPTX
Graph mining ppt
PDF
Noise Models
PPTX
Supervised Machine Learning
PDF
I.BEST FIRST SEARCH IN AI
PPTX
Text MIning
PPT
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
PPTX
Semantic nets in artificial intelligence
PDF
Markov Models
PPTX
Data mining: Classification and prediction
PPTX
Inductive bias
PPT
Rule Based System
PPT
introduction to data mining tutorial
PPTX
Introduction to Data Mining
PDF
Lecture Notes-Finite State Automata for NLP.pdf
PPT
4.2 spatial data mining
Multidimensional data models
Cluster Analysis
5.3 mining sequential patterns
Graph mining ppt
Noise Models
Supervised Machine Learning
I.BEST FIRST SEARCH IN AI
Text MIning
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Semantic nets in artificial intelligence
Markov Models
Data mining: Classification and prediction
Inductive bias
Rule Based System
introduction to data mining tutorial
Introduction to Data Mining
Lecture Notes-Finite State Automata for NLP.pdf
4.2 spatial data mining
Ad

Similar to Sequential pattern mining (20)

PDF
Lecture14 - Advanced topics in association rules
PDF
Lecture 05 Association Rules Advanced Topics
PPT
The comparative study of apriori and FP-growth algorithm
PPT
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
Feequent Item Mining - Data Mining - Pattern Mining
PDF
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
PPT
FP growth algorithm, data mining, data analystics
PPT
Tutorial on Frequent Pattern Mining Approach
PPT
Cs501 mining frequentpatterns
PPT
My6asso
PPT
Cs583 association-rules
PPT
Lecture20
PPTX
Set data structure 2
PDF
Assocrules
PDF
B0950814
PPT
Apriori algorithm
PDF
Chapter5 ML BASED FREQUENT ITEM SETS.pdf
PDF
FINDING FREQUENT SUBPATHS IN A GRAPH
Lecture14 - Advanced topics in association rules
Lecture 05 Association Rules Advanced Topics
The comparative study of apriori and FP-growth algorithm
Chapter - 8.3 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Welcome to International Journal of Engineering Research and Development (IJERD)
Feequent Item Mining - Data Mining - Pattern Mining
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
FP growth algorithm, data mining, data analystics
Tutorial on Frequent Pattern Mining Approach
Cs501 mining frequentpatterns
My6asso
Cs583 association-rules
Lecture20
Set data structure 2
Assocrules
B0950814
Apriori algorithm
Chapter5 ML BASED FREQUENT ITEM SETS.pdf
FINDING FREQUENT SUBPATHS IN A GRAPH
Ad

Recently uploaded (20)

PDF
Pre independence Education in Inndia.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
RMMM.pdf make it easy to upload and study
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Pharma ospi slides which help in ospi learning
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Cell Structure & Organelles in detailed.
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
Insiders guide to clinical Medicine.pdf
Pre independence Education in Inndia.pdf
Microbial diseases, their pathogenesis and prophylaxis
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
RMMM.pdf make it easy to upload and study
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Pharma ospi slides which help in ospi learning
O7-L3 Supply Chain Operations - ICLT Program
Renaissance Architecture: A Journey from Faith to Humanism
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPH.pptx obstetrics and gynecology in nursing
Abdominal Access Techniques with Prof. Dr. R K Mishra
Cell Structure & Organelles in detailed.
Final Presentation General Medicine 03-08-2024.pptx
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Insiders guide to clinical Medicine.pdf

Sequential pattern mining

  • 1. GUIDE : MS. ANAGHA CHAUDHARI
  • 6. A sequence : < (ef) (ab) (df) c b > A sequence database SID sequence An element may contain a set of items. Items within an element are unordered 10 <a(abc)(ac)d(cf)> and we list them alphabetically. 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> <a(bc)df> is a subsequence of 40 <eg(af)cbc> <a(abc)(ac)d(cf)> Given support threshold min_sup =2, <(ab)c> is a sequential pattern 6
  • 7. CHALLENGES ON SEQUENTIAL PATTERN MINING A huge number of possible sequential patterns are hidden in databases A mining algorithm should  find the complete set of patterns, when possible, satisfying the minimum support (frequency) threshold  be highly efficient, scalable, involving only a small number of database scans  be able to incorporate various kinds of user-specific constraints 7
  • 10. The Apriori Algorithm—An Example Supmin = 2 Itemset sup Itemset sup Database TDB {A} 2 Tid Items L1 {A} 2 C1 {B} 3 {B} 3 10 A, C, D {C} 3 1st scan {C} 3 20 B, C, E {D} 1 {E} 3 30 A, B, C, E {E} 3 40 B, E C2 Itemset sup C2 Itemset {A, B} 1 L2 Itemset sup 2nd scan {A, B} {A, C} 2 {A, C} 2 {A, C} {A, E} 1 {B, C} 2 {B, C} 2 {A, E} {B, E} 3 {B, E} 3 {B, C} {C, E} 2 {C, E} 2 {B, E} {C, E} Itemset 3rd scan L3 Itemset sup C3 {B, C, E} {B, C, E} 2 10
  • 11. The Apriori Algorithm [Pseudo-Code] Ck: Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk != ; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end return k Lk; 11
  • 12. APRIORI ADV/DISADV  Advantages:  Uses large itemset property.  Easily parallelized  Easy to implement.  Disadvantages:  Assumes transaction database is memory resident.  Requires up to m database scans.
  • 13. J. Han, J. Pei, and Y. Yin 2000  Depth-first search  Avoid explicit candidate generation  Adopt divide-and-conquer strategy  Two step approach Step1:Build a compact data structure called FP tree Step2:Extract frequent itemsets from FP tree.
  • 14. Step 1: FP-Tree Construction  FP-Tree is constructed using 2 passes over the data-set: Pass 1:  Scan data and find support for each item.  Discard infrequent items.  Sort frequent items in decreasing order based on their support.
  • 15. Pass 2: Nodes correspond to items and have a counter 1. FP-Growth reads 1 transaction at a time and maps it to a path 2. Fixed order is used, so paths can overlap when transactions share items (when they have the same prfix ). – In this case, counters are incremented 3. Pointers are maintained between nodes containing the same item, creating singly linked lists (dotted lines) – The more paths that overlap, the higher the compression. FP-tree may fit in memory. 4. Frequent itemsets extracted from the FP-Tree.
  • 16.  Start from each frequent length-1 pattern (as an initial suffix pattern)  construct its conditional pattern base (a ―subdatabase,‖which consists of the set of prefix paths in the FP-tree co-occurring with the suffix pattern)  Construct its (conditional) FP-tree, and perform mining recursively on such a tree.  The pattern growth is achieved by the concatenation of the suffix pattern with the frequent patterns generated from a conditional FP-tree.
  • 17. Table : Table after first scan of database Table : Transactional data
  • 18. Fig . FP – Tree Construction
  • 19. EXAMPLE CONT Table:Mining FP Tree by creating conditional (sub)-pattern bases
  • 20. EXAMPLE CONT Fig.The conditional FP-tree associated with the conditiona node I3
  • 21. FP-FROWTH ADV/DISADV Advantages of FP-Growth • only 2 passes over data-set • ―compresses‖ data-set • no candidate generation • much faster than Apriori Disadvantages of FP-Growth • FP-Tree may not fit in memory!! • FP-Tree is expensive to build
  • 22. APPLICATIONS Customer shopping sequences:  First buy computer, then CD-ROM, and then digital camera, within 3 months. Medical treatments, natural disasters (e.g., earthquakes), science & eng. processes, stocks and markets, etc. Telephone calling patterns, Weblog click streams DNA sequences and gene structures 22