SlideShare a Scribd company logo
Datamining 9th Association Rule
(Association Rule)
•
    •    PC
•                        HDD
    •
•   30        500
    •
    •           30         500     2


•        ALDH


•               A    B                 A
         A=C∧D
•   Market Basket Analysis
    •   Frequent Pattern Mining
•



           1                      2




           3                      4
•
    •                               [        =2%,   =60%]
•   A B
    • A:      (antecedent), B:     (consequent)
•            support
    •                            A B


    •
•           confidence
    •   A                               B


    •
(1/2)
                                     TID       item
•                                    T100    I1, I2, I5
          I = {I1 , I2 , ..., Im }
                                     T200    I2, I4
•             D                      T300    I2, I4
      T                              T400    I1, I2, I4
•                         T          T500    I1, I3
            T ⊆I                     T600    I2, I3
                                     T700    I1, I3
•                     T       A
                                     T800    I1, I2, I3, I5
     A⊆T                             T900    I1, I2, I3

                                     I = {I1, I2, I3, I4, I5}
•                     itemset
                                     T100 : {I1, I2, I5}
•   itemset   k
          k-itemset
(2/2)                        TID      item

 •             A⇒B                           T100
                                             T200
                                                      I1, I2, I5
                                                      I2, I4
             A ⊂ I, B ⊂ I, A ∩ B = φ
                                             T300     I2, I4
 •    A⇒B
                                             T400     I1, I2, I4
    support(A ⇒ B) = P (A ∪ B)               T500     I1, I3
 conf idence(A ⇒ B) = P (B | A)              T600     I2, I3
                                             T700     I1, I3
 A = {I1} , B = {I2} , A ∪ B = {I1, I2}      T800     I1, I2, I3, I5
 P (A ∪ B) = 4/9 P (B | A) = 4/6             T900     I1, I2, I3


  •
                                   support(A ∪ B)   support count(A ∪ B)
conf idence(A ⇒ B) = P (B | A) =                  =
                                     support(A)       support count(A)
•
•                          A∪B                                     A B,
    B A


    •
•                     (min_sup)
    •    min_sup                     itemset
•                itemset
    1. item           100                           2^100-1
    2.           9-itemset {a1, a2, .., a9}    min_sup               {a1}
             {a2} {a1,a2} {a1, a9} {a1, a2, a3} ...      min_sup
         •                 itemset
•
•
•
•
•
    •
•
•
    •
    •
•
Apriori: Overview-1
•             min_sup          itemset
    •   Agrawal & Srikant 1994
•
•              min_sup = 2
    TID        item
                                1.                D
    T100      I1, I2, I5                 1-itemset
    T200      I2, I4                     C1
    T300      I2, I4                       Itemset    Sup. count
    T400      I1, I2, I4
                                              {I1}        6
    T500      I1, I3
                                              {I2}        7
    T600      I2, I3
    T700      I1, I3                          {I3}        6
    T800      I1, I2, I3, I5                  {I4}        2
    T900      I1, I2, I3                      {I5}        2
Apriori: Overview-2
        2. min_sup                     Itemset
        3.            k-itemset           (k+1)-itemset

C1                                L1                      C2
Itemset      Sup. count           Itemset   Sup. count         Itemset

 {I1}            6                 {I1}          6             {I1,I2}
 {I2}            7                 {I2}          7             {I1,I3}
 {I3}            6                 {I3}          6             {I1, I4}
 {I4}            2                 {I4}          2             {I1, I5}
 {I5}            2                 {I5}          2             {I2, I3}
                                                               {I2, I4}
                                                               {I2, I5}
                                                               {I3, I4}
                                                               {I3, I5}
                                                               {I4, I5}
Apriori: Overview-3
 4.                                    Itemset
       •   DB                HDD
       •
 5. min_sup                                  itemset
C2                                               L2
     Itemset       Itemset    Sup. Count          Itemset    Sup. Count

 {I1,I2}          {I1,I2}          4              {I1,I2}        4
 {I1,I3}          {I1,I3}          4              {I1,I3}        4
 {I1, I4}         {I1, I4}         1              {I1, I5}       2
 {I1, I5}         {I1, I5}         2              {I2, I3}       4
 {I2, I3}         {I2, I3}         4              {I2, I4}       2
 {I2, I4}         {I2, I4}         2              {I2, I5}       2
 {I2, I5}         {I2, I5}         2
 {I3, I4}         {I3, I4}         0
 {I3, I5}         {I3, I5}         1
 {I4, I5}         {I4, I5}         0
Apriori: Overview-4
     •   L1, L2, L3        min_sup             itemset
     •   L1      C2, L2     C3, L3      C4


L2                              C3                       L3
 Itemset      Sup. Count             Itemset               Itemset      Sup. Count
{I1,I2}           4              {I1,I2, I3}             {I1,I2, I3}        2
{I1,I3}           4              {I1,I2, I5}             {I1,I2, I5}        2
{I1, I5}          2
{I2, I3}          4
                                                           C4
{I2, I4}          2
                                                                       Itemset
{I2, I5}          2
Apriori                     {I1, I2, I3, I5}

                  2            2
      {I1, I2, I3} {I1, I2, I5} {I1, I3, I5}           {I2, I3, I4}     {I2, I3, I5}



       4       4        1        2        4        2       2      0        1       0
{I1, I2} {I1,I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2,I5} {I3,I4} {I3,I5} {I4,I5}




            6                7             6              2               2
        {I1}          {I2}          {I3}              {I4}            {I5}

                                     {}
           •        ×        DB                          min_sup
                                            itemset
           •        ×        k-itemset                                itemset
                                          itemset
Apriori: Pruning Phase
  • k-itemset            (k+1)-itemset
            itemset              k-1 item             2        (k+1)-itemset
      •   {I1, I2}, {I1,I3} I1                         {I1, I2, I3}
      •   {I1, I2, I3}, {I1, I2, I5} I1,I2                         {I1, I2, I3, I5}
  •                (k+1)-itemset                                   k-itemset({I1, I2, I3}
            {I1, I2}, {I1, I3}, {I2, I3}) k-itemset
      •
      • {I1, I3, I5}        {I3, I5}                            {I1, I3, I5} min_sup


       {I1, I2, I3}      {I1, I2, I5}     {I1, I3, I5}    {I2, I3, I4}    {I2, I3, I5}




{I1, I2} {I1,I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2,I5} {I3,I4} {I3,I5} {I4,I5}
     4       4        1        2        4       2       2       0        1      0
•   1-itemset
              DIC                               2-itemset
S. Brin, R. Motowani, J. Ullman, and
             S. Tsur. 1997
                                           •          {I2} {I4}                                min_sup
                                              {12,14}                                              . {12,I4}
                                              min_sup
                                       •                  DB



      TID           item                   Apriori                                              DIC
     T100        I1, I2, I5
     T200        I2, I4
                                               1-itemset

                                                           2-itemset

                                                                       3-itemset
     T300        I2, I4




                                                                                   1-itemset



                                                                                                    2-itemset


                                                                                                                3-itemset
     T400        I1, I2, I4
     T500        I1, I3
     T600        I2, I3
     T700        I1, I3
     T800        I1, I2, I3, I5
     T900        I1, I2, I3
•
    •   Hash
        •   (k+1)-itemset             k-itemset
    •
        •           PC
    •   Heap                                itemset
        •   FP-tree (J.Han, J. Pei and Y. Yin. 2000)
•
    •
        •
            •   S.Brin, R. Motwani and C. Silverstein. 1997
            •   S. Morishita and J. Sese. 2000

More Related Content

PDF
Datamining 9th association_rule.key
PDF
Datamining 3rd Naivebayes
PDF
Datamining 5th Knn
PDF
Datamining 2nd decisiontree
PDF
080806
PDF
Ohp Seijoen H20 02 Hensu To Kata
PDF
bioinfolec_20070706 4th
PDF
Datamining 5th knn
Datamining 9th association_rule.key
Datamining 3rd Naivebayes
Datamining 5th Knn
Datamining 2nd decisiontree
080806
Ohp Seijoen H20 02 Hensu To Kata
bioinfolec_20070706 4th
Datamining 5th knn

Similar to Datamining 9th Association Rule (20)

PPTX
Sequential pattern mining
PDF
Lecture14 - Advanced topics in association rules
PPT
The comparative study of apriori and FP-growth algorithm
PDF
Datamining 2nd Decisiontree
PDF
Horizontal format data mining with extended bitmaps
PDF
Horizontal format data mining with extended bitmaps
PDF
Gwt sdm public
PPTX
Context-aware similarities within the factorization framework (CaRR 2013 pres...
PPTX
Leveraging collaborativetaggingforwebitemdesign ajithajjarani
PPTX
Context-aware similarities within the factorization framework - presented at ...
PDF
A Comparison of Evaluation Methods in Coevolution 20070921
PDF
Association Rule Mining with Apriori Algorithm.pdf
PPT
Cs583 association-rules
PPT
Paper
PDF
Lecture 05 Association Rules Advanced Topics
PPTX
Pairwise testing sagar_hadawale
PDF
Direct Hashing and Pruning Algorithm in Data MIning.pdf
PPTX
Association rule mining
PPTX
Set data structure 2
Sequential pattern mining
Lecture14 - Advanced topics in association rules
The comparative study of apriori and FP-growth algorithm
Datamining 2nd Decisiontree
Horizontal format data mining with extended bitmaps
Horizontal format data mining with extended bitmaps
Gwt sdm public
Context-aware similarities within the factorization framework (CaRR 2013 pres...
Leveraging collaborativetaggingforwebitemdesign ajithajjarani
Context-aware similarities within the factorization framework - presented at ...
A Comparison of Evaluation Methods in Coevolution 20070921
Association Rule Mining with Apriori Algorithm.pdf
Cs583 association-rules
Paper
Lecture 05 Association Rules Advanced Topics
Pairwise testing sagar_hadawale
Direct Hashing and Pruning Algorithm in Data MIning.pdf
Association rule mining
Set data structure 2
Ad

More from sesejun (20)

PDF
RNAseqによる変動遺伝子抽出の統計: A Review
PDF
バイオインフォマティクスによる遺伝子発現解析
PDF
次世代シーケンサが求める機械学習
PDF
20110602labseminar pub
PDF
20110524zurichngs 2nd pub
PDF
20110524zurichngs 1st pub
PDF
20110214nips2010 read
PDF
Datamining 8th hclustering
PDF
Datamining r 4th
PDF
Datamining r 3rd
PDF
Datamining r 2nd
PDF
Datamining r 1st
PDF
Datamining 6th svm
PDF
Datamining 4th adaboost
PDF
Datamining 3rd naivebayes
PDF
Datamining 7th kmeans
PDF
100401 Bioinfoinfra
PDF
Datamining 8th Hclustering
PDF
Datamining 9th Association Rule
PDF
Datamining 8th Hclustering
RNAseqによる変動遺伝子抽出の統計: A Review
バイオインフォマティクスによる遺伝子発現解析
次世代シーケンサが求める機械学習
20110602labseminar pub
20110524zurichngs 2nd pub
20110524zurichngs 1st pub
20110214nips2010 read
Datamining 8th hclustering
Datamining r 4th
Datamining r 3rd
Datamining r 2nd
Datamining r 1st
Datamining 6th svm
Datamining 4th adaboost
Datamining 3rd naivebayes
Datamining 7th kmeans
100401 Bioinfoinfra
Datamining 8th Hclustering
Datamining 9th Association Rule
Datamining 8th Hclustering
Ad

Datamining 9th Association Rule

  • 2. (Association Rule) • • PC • HDD • • 30 500 • • 30 500 2 • ALDH • A B A A=C∧D
  • 3. Market Basket Analysis • Frequent Pattern Mining • 1 2 3 4
  • 4. • [ =2%, =60%] • A B • A: (antecedent), B: (consequent) • support • A B • • confidence • A B •
  • 5. (1/2) TID item • T100 I1, I2, I5 I = {I1 , I2 , ..., Im } T200 I2, I4 • D T300 I2, I4 T T400 I1, I2, I4 • T T500 I1, I3 T ⊆I T600 I2, I3 T700 I1, I3 • T A T800 I1, I2, I3, I5 A⊆T T900 I1, I2, I3 I = {I1, I2, I3, I4, I5} • itemset T100 : {I1, I2, I5} • itemset k k-itemset
  • 6. (2/2) TID item • A⇒B T100 T200 I1, I2, I5 I2, I4 A ⊂ I, B ⊂ I, A ∩ B = φ T300 I2, I4 • A⇒B T400 I1, I2, I4 support(A ⇒ B) = P (A ∪ B) T500 I1, I3 conf idence(A ⇒ B) = P (B | A) T600 I2, I3 T700 I1, I3 A = {I1} , B = {I2} , A ∪ B = {I1, I2} T800 I1, I2, I3, I5 P (A ∪ B) = 4/9 P (B | A) = 4/6 T900 I1, I2, I3 • support(A ∪ B) support count(A ∪ B) conf idence(A ⇒ B) = P (B | A) = = support(A) support count(A)
  • 7. • • A∪B A B, B A • • (min_sup) • min_sup itemset • itemset 1. item 100 2^100-1 2. 9-itemset {a1, a2, .., a9} min_sup {a1} {a2} {a1,a2} {a1, a9} {a1, a2, a3} ... min_sup • itemset
  • 8. • • • • • • • • • • •
  • 9. Apriori: Overview-1 • min_sup itemset • Agrawal & Srikant 1994 • • min_sup = 2 TID item 1. D T100 I1, I2, I5 1-itemset T200 I2, I4 C1 T300 I2, I4 Itemset Sup. count T400 I1, I2, I4 {I1} 6 T500 I1, I3 {I2} 7 T600 I2, I3 T700 I1, I3 {I3} 6 T800 I1, I2, I3, I5 {I4} 2 T900 I1, I2, I3 {I5} 2
  • 10. Apriori: Overview-2 2. min_sup Itemset 3. k-itemset (k+1)-itemset C1 L1 C2 Itemset Sup. count Itemset Sup. count Itemset {I1} 6 {I1} 6 {I1,I2} {I2} 7 {I2} 7 {I1,I3} {I3} 6 {I3} 6 {I1, I4} {I4} 2 {I4} 2 {I1, I5} {I5} 2 {I5} 2 {I2, I3} {I2, I4} {I2, I5} {I3, I4} {I3, I5} {I4, I5}
  • 11. Apriori: Overview-3 4. Itemset • DB HDD • 5. min_sup itemset C2 L2 Itemset Itemset Sup. Count Itemset Sup. Count {I1,I2} {I1,I2} 4 {I1,I2} 4 {I1,I3} {I1,I3} 4 {I1,I3} 4 {I1, I4} {I1, I4} 1 {I1, I5} 2 {I1, I5} {I1, I5} 2 {I2, I3} 4 {I2, I3} {I2, I3} 4 {I2, I4} 2 {I2, I4} {I2, I4} 2 {I2, I5} 2 {I2, I5} {I2, I5} 2 {I3, I4} {I3, I4} 0 {I3, I5} {I3, I5} 1 {I4, I5} {I4, I5} 0
  • 12. Apriori: Overview-4 • L1, L2, L3 min_sup itemset • L1 C2, L2 C3, L3 C4 L2 C3 L3 Itemset Sup. Count Itemset Itemset Sup. Count {I1,I2} 4 {I1,I2, I3} {I1,I2, I3} 2 {I1,I3} 4 {I1,I2, I5} {I1,I2, I5} 2 {I1, I5} 2 {I2, I3} 4 C4 {I2, I4} 2 Itemset {I2, I5} 2
  • 13. Apriori {I1, I2, I3, I5} 2 2 {I1, I2, I3} {I1, I2, I5} {I1, I3, I5} {I2, I3, I4} {I2, I3, I5} 4 4 1 2 4 2 2 0 1 0 {I1, I2} {I1,I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2,I5} {I3,I4} {I3,I5} {I4,I5} 6 7 6 2 2 {I1} {I2} {I3} {I4} {I5} {} • × DB min_sup itemset • × k-itemset itemset itemset
  • 14. Apriori: Pruning Phase • k-itemset (k+1)-itemset itemset k-1 item 2 (k+1)-itemset • {I1, I2}, {I1,I3} I1 {I1, I2, I3} • {I1, I2, I3}, {I1, I2, I5} I1,I2 {I1, I2, I3, I5} • (k+1)-itemset k-itemset({I1, I2, I3} {I1, I2}, {I1, I3}, {I2, I3}) k-itemset • • {I1, I3, I5} {I3, I5} {I1, I3, I5} min_sup {I1, I2, I3} {I1, I2, I5} {I1, I3, I5} {I2, I3, I4} {I2, I3, I5} {I1, I2} {I1,I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2,I5} {I3,I4} {I3,I5} {I4,I5} 4 4 1 2 4 2 2 0 1 0
  • 15. 1-itemset DIC 2-itemset S. Brin, R. Motowani, J. Ullman, and S. Tsur. 1997 • {I2} {I4} min_sup {12,14} . {12,I4} min_sup • DB TID item Apriori DIC T100 I1, I2, I5 T200 I2, I4 1-itemset 2-itemset 3-itemset T300 I2, I4 1-itemset 2-itemset 3-itemset T400 I1, I2, I4 T500 I1, I3 T600 I2, I3 T700 I1, I3 T800 I1, I2, I3, I5 T900 I1, I2, I3
  • 16. • Hash • (k+1)-itemset k-itemset • • PC • Heap itemset • FP-tree (J.Han, J. Pei and Y. Yin. 2000) • • • • S.Brin, R. Motwani and C. Silverstein. 1997 • S. Morishita and J. Sese. 2000