SlideShare a Scribd company logo
Datamining 8th hclustering
4.1                          4

•
    •               1
                        10    20     30
                         0.74 0.76 1.34
                                              40
                                               1.75
                 10 2    2.01 2.62 30 0.87
                         20                      40
                                               0.69
                    3
                 0.74    0.87 0.60 1.34
                         0.76         1.83     1.90
                                                 1.75
                    4    1.73 1.83 0.96        0.93
                 2.01    2.62       0.87         0.69
        4.1                                  10     20     30
              40
                 0.87
                    4
                         0.60       1.83 0      1.90
                 1.73    1.83       0.96        0.93




                                2
(   )
•   2
    •
        •
    •
        •
•
    •       (
        •
    •
        •

                3
•
•
•
•                                                           (Top-down
    Clustering, Divisive Clustering)
                         (Bottom-up Clustering, Agglomerative Clustering)


                  C
              B

                      A


                          F
    G         E
                      D
                              A       B     C   D   E   F   G
        (A)                           (B)
                                  4
(           )
•
•
•

                                                                    1
                                                    C
                                                B

                                                        A
                                            3                           2

                                                            F
A   B     C   D   E       F   G       G         E
                                                        D
         1            2           3
        (C)                           (D)
                                  5
•
•
    •
    •
    •
    •                           2




              X             X
        (A)       6   (B)
•
     d(x, y) :      x, y
                 Ci , Cj
       Dmin (Ci , Cj ) = min {d(x, y) | x ∈ Ci , y ∈ Cj }


1.
2.                                   1
3. 2                       1




                               7
1     C                                         C                             C
          B                                       B                                     B

                        A                             3               A             5               A
              E                                 4         E                                 E
                        2
                                F                                         F                             F

 G                                          G                                  G
                            D                                     D                             D
(A) B,C           D,F               (B)     A,E                               (C)


                                    6
                            5
                  4

      3                                 2
              1
A         B       C         E       D       F       G
(D)                                                           8
(1/2)
       •
4.5.
           •                                                                          59



  (1)           N           x1 , . . . , xN                          1
             x1 , . . . , xN                              C1 , . . . , CN
  (2) n = N              n
  (3)                n=1
      (a)                 C1 , . . . , Cn                                   Ci , Cj
          i<j1             2             3    4       5     6       7
      (b) Ci    Cj                     Ci
      (c)
      (d) Cj = Cn                                                       n=n−1

                        4.8

                                                  9
(1)           N           x1 , . . . , xN                                     1

(2) n = N
           x1 , . . . , xN
                       n
                                                                   C1 , . . . , CN
                                                                                   (2/2)
(3)                n=1
    (a)                 C1 , . . . , Cn                                                Ci , Cj
        i<j
    (b) Ci    Cj                     Ci
    (c)
    (d) Cj = Cn                                                                    n=n−1

                   1    4.8   2       3           4            5     6         7

        (a)

                    1         2       3           4            5     6         7

        (b)
              1    C                                   C                               C
          B         1         2       3
                                      B           4            5     6         7
                                                                               B
        (d)
                        A                 3                A               5               A
              E                               E   10                               E
(A)                                      (B)

                                    A        B,C   D,F       E     G
                        A                    1.2   2.3       1.9   4.1       B,C     1.2

                        B,C         1.2            3.2       2.0   4.0       A       1.2
                        D,F         2.3      3.2             2.2   3.5       E       2.2
            C
                        E           1.9      2.0   2.2             2.5       A       1.9
    B
                        G           4.1      4.0   3.5       2.5             E       2.5
                A
        E                                     A B,C
                    F
                            (C)                                        (D)
G
            D                       A,B,C          D,F       E     G
                        A,B,C                      2.3       1.9   4.0       E       1.9
                        D,F         2.3                      2.2   3.5       E       2.2
                        E           1.9            2.2             2.5       A,B,C   1.9

                        G           4.0            3.5       2.5             E       2.5


                                        11               ※
•
    •   Ci    Cj        Ci’
    •   Ci’             Ck
        •    Ci    Ck             Cj   Ck
    •
        •                     N                  O(N)
        •                     O(N^2)
•
    •                             Cj              Ci’
•                   1                              O(N)     O(N^2)


•                       N-1                                         O(N)
                                                 O(N^2),   O(N^2)
                                            12
35, 48)




•(Kruskal’s Algorithm)                     (Minimum Spanning Tree)
•
       4.2                      G = (V, E)(V                     E                )
 T ⊆G             G                    T                 T
                       G                   V    T                                 T
     (Spanning Tree)                                                                  G
     (u, v) ∈ E            w(u, v)                                                G
                                      (u,v)∈T   w(u, v)                               T
 G                    (Minimum Spanning Tree)

       4.13(A)                                                       4.9(P. 59)



                                               4.13(B)           4.13(B)
        G                                        1              BC                    6
             GE                                     13       4.9(D) P. 59
G




         w(u,v)   u,v




    14
Kruscal(
 72                                                           )               4
  (1)                                                         G = (V, E) V                         E

  (2) A                              A
  (3) V                                                                   (                1
                    )
  (4)                                                      (                      (u, v) ∈ E           )
            (a)                          A       A ∪ {(u, v)}                      u v

            (b) u        v

  (5)                                                                 A

                             C                                    C                                        C
                  B                                   B                                   B
                  4.14                                                    A

                             A                                    A                                        A
                   E                                      E                                    E
                             C                    e                   V-C
                                 F                                    F                                        F

        G                                    G                                      G
                             D                                    D                                        D
                 A={}                         A={(B,C),(D,F)}                     A={(B,C),(D,F),(A,B)}
      {A}, {B}, {C}, {D}, {E}, {F}           {A}, {B,C}, {D,F}, {E}
                                               e’                                  {A, B,C}, {D,F}, {E}
                                                      15
C                                               C
          B
                                                           1
                                                       B

                          A
                                                               3
                                                                       A
              E                                            E       4
                              F
                                                       6           5           F
                                                                           2
      G                                       G
                          D                                            D
(A)                                       (B) (A)




                                      6
                                  5
                              4
                  3                       2
                          1
              A       B       C   E   D       F        G
                      (C)



                                                  16
C                                     C                           C
          B                                 B                                  B
                      A                                     A                           A
        E                                   E                                  E
                              F                                 F                           F

 G                                  G                                G
                      D                                 D                               D
(A)                               (B)               A                (C)                    ( AB)
                                    E       A               B,C            T                B

                  C                                     C
       B                                B
                      A
                                                            A
      E                                         E                                  Q

                          F                                     F
                                                                                             T
G                                  G
                  D                                     D
(D)   C, F    T                   (E)
                                                                      O(E + V log V )

                                                                17
X

•
     d(x, y) :  x, y
            Ci , Cj
        Dmax (Ci , Cj ) = max{d(x, y)|x ∈ Ci , y ∈ Cj }

1.
2.                                    1
3. 2                    1
                                18
1                                           3        C                                    C
                         C
           B                                          B                                     B

                         A                                              A                                    A
               E                                                                        5
                         2                             4
                                                                                                E
                                 F                         E                F                                    F
                                                                                        G
 G                                           G
                             D                                      D                                    D
(A) B,C            D,F                           (B)       A,E                    (C)       G


                                                  5                     (A) (C)             (1      5)
                                                                        (D)
                                                                        1 5
                                         4
       3
                                     2
               1

A          B       C         D           F       E         G
 (D)


                                                               19
(A)                            (B)
            A    B,C D,F     E     G
A                1.3 3.0     1.9   4.1          B,C   1.3
B,C        1.3         4.1   2.5   4.5          A     1.3
D,F        3.0   4.1         2.3   4.0          E     2.3
E          1.9   2.5 2.3           2.5          A     1.9
G          4.1   4.5 4.0     2.5                E     2.5
            A B,C
    (C)                                   (D)
           A,B,C       D,F   E     G
A,B,C                  4.1   2.5   4.5          E     2.5
D,F        4.1               2.3   4.0          E     2.3
E          2.5         2.3         2.5          D,F   2.3
G          4.5         4.0   2.5                E     2.5   20
C
                C
        B                                             B

                A                                                          A
            E                                             E

                        F                                                      F

    G                                        G
                                                                       D
                    D
(A) E                         A     (B) A B,C                                  E
                                                          A


                C
        B
                            (C) A B,C                              E
                    A          D,F           CE               DE
                                                  E
            E           F

    G
                D                                                                           O
                            (N^2)                                                  O(N^3)
                                        21
{A},1.9
                                                                O(N^2 log N)
    {B,C}, 2.5             {D,F}, 2.3            (                             O(log N) )

{G}, 2.5
    (A)      E

                                             {A},1.9            {B,C}, 2.5

                                           (B)              2
                  {D,F}, 2.3


           {G}, 2.5
           (C)

                                                     {A, B,C}, 2.5
                                           (D)
                 {D,F}, 2.3


          {G}, 2.5         {A, B,C}, 2.5

           (E)
                                             22
duces the clusters shown in Figure 12,
                                                          whereas the complete-link algorithm ob-
                                                          tains the clustering shown in Figure 13.
                                                                         Data Clustering         •      277
S                                                         The clusters obtained by the complete-
i
X
m2                                                        link algorithm are more compact than
                                                             X2
i                                                         those obtained by the single-link algo-
l
a                                                         rithm; the cluster labeled 1 obtained
r                                      2              2   using the 1single-link algorithm 2is elon-
                                                                     1 11                                 2
i        1 111                    2 2 22   2                                                2 2 22 2
t          1 1 11                     2              2    gated because 1of the noisy patterns la-
                                                                       1 1 1                    2
                                                                                                        2
                                                                                                         2
                                   2                2                                         2
y        11
             1 1 1 *** * * * * **    2     2       2      beled “*”. 1 1 1 1 single-link * algorithm 2is
                                                                     11
                                                                         The * * * * * * * * 2      2
          1 1                      2       2                          1                      2      2
            1 1 1 1
                 1                             2          more versatile 1than the complete-link
                                                                      1
                                                                        1 1 1
                                                                              1                     2
                                                                                                      2
          1                                2
                                                          algorithm, otherwise. For example, the
                                                          single-link algorithm can extract the
                                                          concentric clusters shown in Figure 11,
             A       B   C   D       E   F***G
                                                          but the complete-link algorithm cannot.
Figure 10. The dendrogram obtained using X1               However, from a pragmatic viewpoint, it 1         X
the single-link algorithm.
 Figure 12. A single-link clustering of a pattern         has been observed that clustering of a pat-
                                                             Figure 13. A complete-link the complete-
 set containing two classes (1 and 2) connected by           tern set containing two classes (1 and 2) con-
                                                          link algorithm produces more useful hi-
Y
 a chain of noisy patterns (*).                           erarchies inchain of noisy patterns (*).
                                                             nected by a
                                                                           many applications than the
                         1                                single-link algorithm [Jain and Dubes
                                 1                        1988].
                 1
(3) The output of the algorithm is a                        well-separated, chain-like, and concen-
                   2
    nested hierarchy of graphs which
                        2       1                           tric clusters, whereas a typicalClus-
                                                              Agglomerative Single-Link              parti-
                      2
    can be cut at a desired dissimilarity
           1    2          2                              tering Algorithm such as the k -means
                                                            tional algorithm
    level forming a partition (clustering)
                                1
                                                            algorithm works well only on data sets
                      2                                   (1) Place isotropic clustersits own clus-
                                                            having each pattern in [Nagy 1968].
    identified by simply connected com-
             1
    ponents in the 1corresponding graph.
                           1                                On theConstruct a list of interpattern
                                                                ter. other hand, the time and space
                                                            complexities for all 1992] ofunordered
                                                                distances [Day distinct the parti-
Agglomerative Complete-Link Clus-                          23
                                                            tional algorithms are typically lower
                             X                                  pairs of patterns, and sort this list
•
  •   Average Group Linkage
      •                      1         
           D(Ci , Cj ) =                        D(x1 , x2 )
                         |Ci ||Cj |
                                x1 ∈Ci x2 ∈Cj

  •   Ward’s Method
      •
           D(Ci , Cj ) = E(Ci ∪ Cj E(Ci ) − E(Cj )
                                     )−
                      where E(Ci ) =    (d(x, ci ))2 ,
                                    x∈Ci
                                 1 
                           ci =       x
                                |Ci |
                                    x∈Ci




Average Group Linkage          24
                                         Ward’s Method
Datamining 8th hclustering
Datamining 8th hclustering
(1)
•
    •
    •
•   n
    •             n*(n-1)


        •
    •       2^n
        •
(2)
•
    •
    •
    •
•
•   DIANA       (DIvisive ANAlysis )
    •   DIANA


        •
DIANA (1)
       •                                       V(i,S)

V
S(⊂V):
d(i, j) :     i    j
S                 i∈V-S                                        V(i,S)

V (i, S)
                 
               1
            |V |−1   j∈V −{i} d(i, j)                                    if S = φ
=                                                   
                 1
            |V −S|−1    j∈S∪{i} d(i, j)   −    1
                                               |S|      j∈S    d(i, j)   if S = φ


    V(i,S)             i   S                         - (V-S)
DIANA (2):
                      C               A
                B                 B   1.2     B

                      A           C   1.3     1.0   C
                  E               D   3.0     4.0   4.1   D
                              F   E   1.9     2.0   2.5   2.3     E

           G                      F   2.3     3.2   3.4   1.1     2.2   F
                          D       G   4.1     4.0   4.5   3.5     2.5   4.0
         (A)                                (B)

6                                                             4
(1)        : S       {}      S
(2) V (i, S)          i∈V −S
(3) V (i, S)  0    i S               (2)
(4) V (i, S) ≤ 0    S     i                                                   (5)
(5)           V V −S


               4.24
DIANA (3): 1                                                   (1)
             C               A                                                C
        B                B   1.2 B                                     B

              A          C 1.3 1.0 C                                           A
         E               D 3.0 4.0 4.1 D                               E

                     F   E   1.9 2.0 2.5 2.3    E                                     F
                         F   2.3 3.2 3.4 1.1    2.2   F        G
    G
                 D       G 4.1 4.0 4.5 3.5      2.5   4.0                         D

  (A)                            (B)                         (C)
                                                                   G
  V (i, S)
             
           1
        |V |−1   j∈V −{i} d(i, j)                                          if S = φ
  =                                                 
             1
        |V −S|−1    j∈S∪{i} d(i, j)     −      1
                                               |S|    j∈S   d(i, j)        if S = φ

V (G, {})
  = 1/6(d(G, A) + d(G, B) + d(G, C) + d(G, D) + d(G, E) + d(G, F))
  = 1/6(4.1 + 4.0 + 4.5 + 3.5 + 2.5 + 4.0) = 3.77
DIANA (4): 1                                                       (2)
            A
                                                                  C
        B   1.2 B                                           B
        C 1.3 1.0 C
                                                                    A
        D 3.0 4.0 4.1 D                                       E
        E   1.9 2.0 2.5 2.3   E                                           F
        F   2.3 3.2 3.4 1.1   2.2   F
                                                    G
        G 4.1 4.0 4.5 3.5     2.5   4.0                               D
                (B)                             (D)
                                                        E
 V (i, S)
               
          1
       |V |−1   j∈V −{i} d(i, j)                                              if S = φ
 =                                                 
            1
       |V −S|−1    j∈S∪{i} d(i, j)       −    1
                                              |S|       j∈S   d(i, j)         if S = φ

V (E, {})
   = 1/6(d(E, A) + d(E, B) + d(E, C) + d(E, D) + d(E, F) + d(E, G))
   = 1/6(1.9 + 2.0 + 2.5 + 2.3 + 2.2 + 2.5) = 2.23
DIANA (5): 1                              (3)
  •   V(i, {})

V (A, {}) = 13.8/6 = 2.3, V (B, {}) = 15.4/6 = 2.57,
V (C, {}) = 16.8/6 = 2.8, V (D, {}) = 18.0/6 = 3.0,
V (E, {}) = 2.23, V (F, {}) = 16.2/6 = 2.7
V (G, {}) = 3.77

  •                            V(G, {})
  •   V(G, {})  0       G                   S
      •   S = {G}
  •                  S
DIANA (6): 2                                               (1)
                   C             A
            B                B   1.2 B

                   A         C 1.3 1.0 C
           E                 D 3.0 4.0 4.1 D
                       F     E   1.9 2.0 2.5 2.3     E

                   D         F   2.3 3.2 3.4 1.1     2.2   F
     G
                             G 4.1 4.0 4.5 3.5       2.5   4.0

V (i, S)
              
         1
      |V |−1   j∈V −{i} d(i, j)                                   if S = φ
=                                             
           1
      |V −S|−1    j∈S∪{i} d(i, j)   −    1
                                         |S|   j∈S   d(i, j)      if S = φ
V (A, {G})
   = 1/5(d(A, B) + d(A, C)+
       d(A, D) + d(A, E) + d(A, F)) − 1/1(d(A, G))
   = 1/5(1.2 + 1.3 + 3.0 + 1.9 + 2.3) − 4.1 = −2.16
DIANA (7): 2                                    (2)
      •
V (B, {G})   =   1/5(1.2 + 1.0 + 4.0 + 2.0 + 3.2) − 4.0 = −1.72
V (C, {G})   =   1/5(1.3 + 1.0 + 4.1 + 2.5 + 3.4) − 4.5 = −2.04
V (D, {G})   =   1/5(3.0 + 4.0 + 4.1 + 2.3 + 1.1) − 3.5 = −0.6
V (E, {G})   =   1/5(1.9 + 2.0 + 2.5 + 2.3 + 2.2) − 2.5 = −0.32
V (F, {G})   =   1/5(2.3 + 3.2 + 3.4 + 1.1 + 2.2) − 4.0 = −1.56
      •                               V(E {G})
      •           V(E, {G})  0
                                  C
                          B

                                  A
                         E
                                      F

                   G              D
DIANA (8): 2                                               (1)
•   V   {A,B,C,D,E,F,G}    {A,B,C,D,E,F}     {G}
•   V = {G}
•   V = {A,B,C,D,E,F}
V (i, S)
             
           1
        |V |−1   j∈V −{i} d(i, j)                                  if S = φ
=                                                
             1
        |V −S|−1    j∈S∪{i} d(i, j)    −    1
                                            |S|    j∈S   d(i, j)   if S = φ

    V (A, {})    =      1/5(1.2 + 1.3 + 3.0 + 1.9 + 2.3) = 1.94
    V (B, {})    =      1/5(1.2 + 1.0 + 4.0 + 2.0 + 3.2) = 2.28
    V (C, {})    =      1/5(1.3 + 1.0 + 4.1 + 2.5 + 3.4) = 2.46
    V (D, {})    =      1/5(3.0 + 4.0 + 4.1 + 2.3 + 1.1) = 2.9
    V (E, {})    =      1/5(1.9 + 2.0 + 2.5 + 2.3 + 2.2) = 2.18
    V (F, {})    =      1/5(2.3 + 3.2 + 3.4 + 1.1 + 2.2) = 2.44
    •                           S={D}                      S
DIANA (9): 2                                               (2)
 •   V={A,B,C,D,E,F}, S={D}               V(i, S)
V (i, S)
              
            1
         |V |−1   j∈V −{i} d(i, j)                                  if S = φ
=                                                
              1
         |V −S|−1    j∈S∪{i} d(i, j)   −    1
                                            |S|     j∈S   d(i, j)   if S = φ
V (A, {D})      =    1/4(1.2 + 1.3 + 1.9 + 2.3) − 3.0 = −1.325
V (B, {D})      =    1/4(1.2 + 1.0 + 2.0 + 3.2) − 4.0 = −2.15
V (C, {D})      =    1/4(1.3 + 1.0 + 2.5 + 3.4) − 4.1 = −2.05
V (E, {D})      =    1/4(1.9 + 2.0 + 2.5 + 2.2) − 2.3 = −0.15
V (F, {D})      =    1/4(2.3 + 3.2 + 3.4 + 2.2) − 1.1 = 1.675
                                                                                   C
     •   S                        F                                        B

     •   V(F, {D})  0                S
                                                                         E
                                                                                   A

                                                                                       F
                                                                               D
                                                                    G
DIANA (10): 2                                                            (3)
       • V={A,B,C,D,E,F}, S={D,F}
 V (i, S)
              
            1
         |V |−1   j∈V −{i} d(i, j)                                      if S = φ
 =                                               
              1
         |V −S|−1    j∈S∪{i} d(i, j)   −    1
                                            |S|   j∈S     d(i, j)       if S = φ
V (A, {D, F})      =       1/3(1.2 + 1.3 + 1.9) − 1/2(3.0 + 2.3) = −1.183
V (B, {D, F})      =       1/3(1.2 + 1.0 + 2.0) − 1/2(4.0 + 3.2) = −2.2
V (C, {D, F})      =       1/3(1.3 + 1.0 + 2.5) − 1/2(4.1 + 3.4) = −2.15
V (E, {D, F})      =       1/3(1.9 + 2.0 + 2.5) − 1/2(2.3 + 2.2) = −0.117
         •
                       C
               B

                       A
              E
                            F

         G
                       D                              G     D       F     E    A    B   C
DIANA

More Related Content

PDF
2011 10 lyp_mathematics_sa1_15 (1)
DOCX
Binomial expansion+teorem pascal+mapping
PDF
09 trial kedah_s1
PDF
Summer Task - MATHS - Yr 12 preparation
PDF
Multiple choice one
PDF
Pmr trial-2010-math-qa-perak
DOC
Topic 2 fractions
PDF
2011 10 lyp_mathematics_sa1_15 (1)
Binomial expansion+teorem pascal+mapping
09 trial kedah_s1
Summer Task - MATHS - Yr 12 preparation
Multiple choice one
Pmr trial-2010-math-qa-perak
Topic 2 fractions

What's hot (17)

PDF
June 2006
PDF
cxc June 2010 math
PDF
Iitjee2011paper1
PDF
January 2008
PDF
Dependable direct solutions for linear systems using a little extra precision
PDF
CXC -maths-2009-p1
PDF
June 2009
PDF
K-means clustering exercise based on eucalidean distance
PDF
January 2012
PDF
June 2008
PDF
Advanced gmat mathquestions-version2
PDF
CXC MATHEMATICS MULTIPLE CHOICE
PDF
cxc.Mathsexam1
PDF
Algebra1
PDF
In-Database Predictive Analytics
PDF
January 2010
PDF
formulas calculo integral y diferencial
June 2006
cxc June 2010 math
Iitjee2011paper1
January 2008
Dependable direct solutions for linear systems using a little extra precision
CXC -maths-2009-p1
June 2009
K-means clustering exercise based on eucalidean distance
January 2012
June 2008
Advanced gmat mathquestions-version2
CXC MATHEMATICS MULTIPLE CHOICE
cxc.Mathsexam1
Algebra1
In-Database Predictive Analytics
January 2010
formulas calculo integral y diferencial
Ad

Viewers also liked (16)

PDF
Datamining 9th association_rule.key
PDF
Datamining r 3rd
PDF
20110214nips2010 read
PDF
Datamining 3rd naivebayes
PDF
Datamining r 4th
PDF
20110602labseminar pub
PDF
20110524zurichngs 2nd pub
PDF
20110524zurichngs 1st pub
PDF
Datamining r 2nd
PDF
Datamining r 1st
PDF
Datamining 5th knn
PDF
Datamining 6th svm
PDF
Datamining 4th adaboost
PDF
次世代シーケンサが求める機械学習
PDF
RNAseqによる変動遺伝子抽出の統計: A Review
PDF
バイオインフォマティクスによる遺伝子発現解析
Datamining 9th association_rule.key
Datamining r 3rd
20110214nips2010 read
Datamining 3rd naivebayes
Datamining r 4th
20110602labseminar pub
20110524zurichngs 2nd pub
20110524zurichngs 1st pub
Datamining r 2nd
Datamining r 1st
Datamining 5th knn
Datamining 6th svm
Datamining 4th adaboost
次世代シーケンサが求める機械学習
RNAseqによる変動遺伝子抽出の統計: A Review
バイオインフォマティクスによる遺伝子発現解析
Ad

Similar to Datamining 8th hclustering (20)

PDF
เซต
DOCX
ข้อสอบ Gsp
PDF
Datamining 7th kmeans
PDF
9702 w11 ms_11
PDF
Datamining 7th Kmeans
PPS
Data envelopment analysis
KEY
Taocp 2.3
PDF
9702 w11 ms_13
PDF
Oel Cutoffs 2009 Coded
PDF
PPTX
Video Final 2
PPTX
شبكات الأعمال
PDF
9702 w11 ms_12
PDF
Year 11 interim 1 League Tables 2011
PPTX
Políticas_de_planificación
PDF
Large-Scale Graph Processing〜Introduction〜(完全版)
XLS
Artışsal atama örneği (12. Hafta)
DOCX
Kunci jawaban paket a & paket b
PDF
5090 s09 ms_1
เซต
ข้อสอบ Gsp
Datamining 7th kmeans
9702 w11 ms_11
Datamining 7th Kmeans
Data envelopment analysis
Taocp 2.3
9702 w11 ms_13
Oel Cutoffs 2009 Coded
Video Final 2
شبكات الأعمال
9702 w11 ms_12
Year 11 interim 1 League Tables 2011
Políticas_de_planificación
Large-Scale Graph Processing〜Introduction〜(完全版)
Artışsal atama örneği (12. Hafta)
Kunci jawaban paket a & paket b
5090 s09 ms_1

More from sesejun (12)

PDF
Datamining 2nd decisiontree
PDF
100401 Bioinfoinfra
PDF
Datamining 8th Hclustering
PDF
Datamining 9th Association Rule
PDF
Datamining 9th Association Rule
PDF
Datamining 8th Hclustering
PDF
Datamining R 4th
PDF
Datamining 6th Svm
PDF
Datamining 5th Knn
PDF
Datamining 4th Adaboost
PDF
Datamining 3rd Naivebayes
PDF
Datamining R 2nd
Datamining 2nd decisiontree
100401 Bioinfoinfra
Datamining 8th Hclustering
Datamining 9th Association Rule
Datamining 9th Association Rule
Datamining 8th Hclustering
Datamining R 4th
Datamining 6th Svm
Datamining 5th Knn
Datamining 4th Adaboost
Datamining 3rd Naivebayes
Datamining R 2nd

Datamining 8th hclustering

  • 2. 4.1 4 • • 1 10 20 30 0.74 0.76 1.34 40 1.75 10 2 2.01 2.62 30 0.87 20 40 0.69 3 0.74 0.87 0.60 1.34 0.76 1.83 1.90 1.75 4 1.73 1.83 0.96 0.93 2.01 2.62 0.87 0.69 4.1 10 20 30 40 0.87 4 0.60 1.83 0 1.90 1.73 1.83 0.96 0.93 2
  • 3. ( ) • 2 • • • • • • ( • • • 3
  • 4. • • • • (Top-down Clustering, Divisive Clustering) (Bottom-up Clustering, Agglomerative Clustering) C B A F G E D A B C D E F G (A) (B) 4
  • 5. ( ) • • • 1 C B A 3 2 F A B C D E F G G E D 1 2 3 (C) (D) 5
  • 6. • • • • • • 2 X X (A) 6 (B)
  • 7. d(x, y) : x, y Ci , Cj Dmin (Ci , Cj ) = min {d(x, y) | x ∈ Ci , y ∈ Cj } 1. 2. 1 3. 2 1 7
  • 8. 1 C C C B B B A 3 A 5 A E 4 E E 2 F F F G G G D D D (A) B,C D,F (B) A,E (C) 6 5 4 3 2 1 A B C E D F G (D) 8
  • 9. (1/2) • 4.5. • 59 (1) N x1 , . . . , xN 1 x1 , . . . , xN C1 , . . . , CN (2) n = N n (3) n=1 (a) C1 , . . . , Cn Ci , Cj i<j1 2 3 4 5 6 7 (b) Ci Cj Ci (c) (d) Cj = Cn n=n−1 4.8 9
  • 10. (1) N x1 , . . . , xN 1 (2) n = N x1 , . . . , xN n C1 , . . . , CN (2/2) (3) n=1 (a) C1 , . . . , Cn Ci , Cj i<j (b) Ci Cj Ci (c) (d) Cj = Cn n=n−1 1 4.8 2 3 4 5 6 7 (a) 1 2 3 4 5 6 7 (b) 1 C C C B 1 2 3 B 4 5 6 7 B (d) A 3 A 5 A E E 10 E
  • 11. (A) (B) A B,C D,F E G A 1.2 2.3 1.9 4.1 B,C 1.2 B,C 1.2 3.2 2.0 4.0 A 1.2 D,F 2.3 3.2 2.2 3.5 E 2.2 C E 1.9 2.0 2.2 2.5 A 1.9 B G 4.1 4.0 3.5 2.5 E 2.5 A E A B,C F (C) (D) G D A,B,C D,F E G A,B,C 2.3 1.9 4.0 E 1.9 D,F 2.3 2.2 3.5 E 2.2 E 1.9 2.2 2.5 A,B,C 1.9 G 4.0 3.5 2.5 E 2.5 11 ※
  • 12. • Ci Cj Ci’ • Ci’ Ck • Ci Ck Cj Ck • • N O(N) • O(N^2) • • Cj Ci’ • 1 O(N) O(N^2) • N-1 O(N) O(N^2), O(N^2) 12
  • 13. 35, 48) •(Kruskal’s Algorithm) (Minimum Spanning Tree) • 4.2 G = (V, E)(V E ) T ⊆G G T T G V T T (Spanning Tree) G (u, v) ∈ E w(u, v) G (u,v)∈T w(u, v) T G (Minimum Spanning Tree) 4.13(A) 4.9(P. 59) 4.13(B) 4.13(B) G 1 BC 6 GE 13 4.9(D) P. 59
  • 14. G w(u,v) u,v 14
  • 15. Kruscal( 72 ) 4 (1) G = (V, E) V E (2) A A (3) V ( 1 ) (4) ( (u, v) ∈ E ) (a) A A ∪ {(u, v)} u v (b) u v (5) A C C C B B B 4.14 A A A A E E E C e V-C F F F G G G D D D A={} A={(B,C),(D,F)} A={(B,C),(D,F),(A,B)} {A}, {B}, {C}, {D}, {E}, {F} {A}, {B,C}, {D,F}, {E} e’ {A, B,C}, {D,F}, {E} 15
  • 16. C C B 1 B A 3 A E E 4 F 6 5 F 2 G G D D (A) (B) (A) 6 5 4 3 2 1 A B C E D F G (C) 16
  • 17. C C C B B B A A A E E E F F F G G G D D D (A) (B) A (C) ( AB) E A B,C T B C C B B A A E E Q F F T G G D D (D) C, F T (E) O(E + V log V ) 17
  • 18. X • d(x, y) : x, y Ci , Cj Dmax (Ci , Cj ) = max{d(x, y)|x ∈ Ci , y ∈ Cj } 1. 2. 1 3. 2 1 18
  • 19. 1 3 C C C B B B A A A E 5 2 4 E F E F F G G G D D D (A) B,C D,F (B) A,E (C) G 5 (A) (C) (1 5) (D) 1 5 4 3 2 1 A B C D F E G (D) 19
  • 20. (A) (B) A B,C D,F E G A 1.3 3.0 1.9 4.1 B,C 1.3 B,C 1.3 4.1 2.5 4.5 A 1.3 D,F 3.0 4.1 2.3 4.0 E 2.3 E 1.9 2.5 2.3 2.5 A 1.9 G 4.1 4.5 4.0 2.5 E 2.5 A B,C (C) (D) A,B,C D,F E G A,B,C 4.1 2.5 4.5 E 2.5 D,F 4.1 2.3 4.0 E 2.3 E 2.5 2.3 2.5 D,F 2.3 G 4.5 4.0 2.5 E 2.5 20
  • 21. C C B B A A E E F F G G D D (A) E A (B) A B,C E A C B (C) A B,C E A D,F CE DE E E F G D O (N^2) O(N^3) 21
  • 22. {A},1.9 O(N^2 log N) {B,C}, 2.5 {D,F}, 2.3 ( O(log N) ) {G}, 2.5 (A) E {A},1.9 {B,C}, 2.5 (B) 2 {D,F}, 2.3 {G}, 2.5 (C) {A, B,C}, 2.5 (D) {D,F}, 2.3 {G}, 2.5 {A, B,C}, 2.5 (E) 22
  • 23. duces the clusters shown in Figure 12, whereas the complete-link algorithm ob- tains the clustering shown in Figure 13. Data Clustering • 277 S The clusters obtained by the complete- i X m2 link algorithm are more compact than X2 i those obtained by the single-link algo- l a rithm; the cluster labeled 1 obtained r 2 2 using the 1single-link algorithm 2is elon- 1 11 2 i 1 111 2 2 22 2 2 2 22 2 t 1 1 11 2 2 gated because 1of the noisy patterns la- 1 1 1 2 2 2 2 2 2 y 11 1 1 1 *** * * * * ** 2 2 2 beled “*”. 1 1 1 1 single-link * algorithm 2is 11 The * * * * * * * * 2 2 1 1 2 2 1 2 2 1 1 1 1 1 2 more versatile 1than the complete-link 1 1 1 1 1 2 2 1 2 algorithm, otherwise. For example, the single-link algorithm can extract the concentric clusters shown in Figure 11, A B C D E F***G but the complete-link algorithm cannot. Figure 10. The dendrogram obtained using X1 However, from a pragmatic viewpoint, it 1 X the single-link algorithm. Figure 12. A single-link clustering of a pattern has been observed that clustering of a pat- Figure 13. A complete-link the complete- set containing two classes (1 and 2) connected by tern set containing two classes (1 and 2) con- link algorithm produces more useful hi- Y a chain of noisy patterns (*). erarchies inchain of noisy patterns (*). nected by a many applications than the 1 single-link algorithm [Jain and Dubes 1 1988]. 1 (3) The output of the algorithm is a well-separated, chain-like, and concen- 2 nested hierarchy of graphs which 2 1 tric clusters, whereas a typicalClus- Agglomerative Single-Link parti- 2 can be cut at a desired dissimilarity 1 2 2 tering Algorithm such as the k -means tional algorithm level forming a partition (clustering) 1 algorithm works well only on data sets 2 (1) Place isotropic clustersits own clus- having each pattern in [Nagy 1968]. identified by simply connected com- 1 ponents in the 1corresponding graph. 1 On theConstruct a list of interpattern ter. other hand, the time and space complexities for all 1992] ofunordered distances [Day distinct the parti- Agglomerative Complete-Link Clus- 23 tional algorithms are typically lower X pairs of patterns, and sort this list
  • 24. • • Average Group Linkage • 1 D(Ci , Cj ) = D(x1 , x2 ) |Ci ||Cj | x1 ∈Ci x2 ∈Cj • Ward’s Method • D(Ci , Cj ) = E(Ci ∪ Cj E(Ci ) − E(Cj ) )− where E(Ci ) = (d(x, ci ))2 , x∈Ci 1 ci = x |Ci | x∈Ci Average Group Linkage 24 Ward’s Method
  • 27. (1) • • • • n • n*(n-1) • • 2^n •
  • 28. (2) • • • • • • DIANA (DIvisive ANAlysis ) • DIANA •
  • 29. DIANA (1) • V(i,S) V S(⊂V): d(i, j) : i j S i∈V-S V(i,S) V (i, S) 1 |V |−1 j∈V −{i} d(i, j) if S = φ = 1 |V −S|−1 j∈S∪{i} d(i, j) − 1 |S| j∈S d(i, j) if S = φ V(i,S) i S - (V-S)
  • 30. DIANA (2): C A B B 1.2 B A C 1.3 1.0 C E D 3.0 4.0 4.1 D F E 1.9 2.0 2.5 2.3 E G F 2.3 3.2 3.4 1.1 2.2 F D G 4.1 4.0 4.5 3.5 2.5 4.0 (A) (B) 6 4 (1) : S {} S (2) V (i, S) i∈V −S (3) V (i, S) 0 i S (2) (4) V (i, S) ≤ 0 S i (5) (5) V V −S 4.24
  • 31. DIANA (3): 1 (1) C A C B B 1.2 B B A C 1.3 1.0 C A E D 3.0 4.0 4.1 D E F E 1.9 2.0 2.5 2.3 E F F 2.3 3.2 3.4 1.1 2.2 F G G D G 4.1 4.0 4.5 3.5 2.5 4.0 D (A) (B) (C) G V (i, S) 1 |V |−1 j∈V −{i} d(i, j) if S = φ = 1 |V −S|−1 j∈S∪{i} d(i, j) − 1 |S| j∈S d(i, j) if S = φ V (G, {}) = 1/6(d(G, A) + d(G, B) + d(G, C) + d(G, D) + d(G, E) + d(G, F)) = 1/6(4.1 + 4.0 + 4.5 + 3.5 + 2.5 + 4.0) = 3.77
  • 32. DIANA (4): 1 (2) A C B 1.2 B B C 1.3 1.0 C A D 3.0 4.0 4.1 D E E 1.9 2.0 2.5 2.3 E F F 2.3 3.2 3.4 1.1 2.2 F G G 4.1 4.0 4.5 3.5 2.5 4.0 D (B) (D) E V (i, S) 1 |V |−1 j∈V −{i} d(i, j) if S = φ = 1 |V −S|−1 j∈S∪{i} d(i, j) − 1 |S| j∈S d(i, j) if S = φ V (E, {}) = 1/6(d(E, A) + d(E, B) + d(E, C) + d(E, D) + d(E, F) + d(E, G)) = 1/6(1.9 + 2.0 + 2.5 + 2.3 + 2.2 + 2.5) = 2.23
  • 33. DIANA (5): 1 (3) • V(i, {}) V (A, {}) = 13.8/6 = 2.3, V (B, {}) = 15.4/6 = 2.57, V (C, {}) = 16.8/6 = 2.8, V (D, {}) = 18.0/6 = 3.0, V (E, {}) = 2.23, V (F, {}) = 16.2/6 = 2.7 V (G, {}) = 3.77 • V(G, {}) • V(G, {}) 0 G S • S = {G} • S
  • 34. DIANA (6): 2 (1) C A B B 1.2 B A C 1.3 1.0 C E D 3.0 4.0 4.1 D F E 1.9 2.0 2.5 2.3 E D F 2.3 3.2 3.4 1.1 2.2 F G G 4.1 4.0 4.5 3.5 2.5 4.0 V (i, S) 1 |V |−1 j∈V −{i} d(i, j) if S = φ = 1 |V −S|−1 j∈S∪{i} d(i, j) − 1 |S| j∈S d(i, j) if S = φ V (A, {G}) = 1/5(d(A, B) + d(A, C)+ d(A, D) + d(A, E) + d(A, F)) − 1/1(d(A, G)) = 1/5(1.2 + 1.3 + 3.0 + 1.9 + 2.3) − 4.1 = −2.16
  • 35. DIANA (7): 2 (2) • V (B, {G}) = 1/5(1.2 + 1.0 + 4.0 + 2.0 + 3.2) − 4.0 = −1.72 V (C, {G}) = 1/5(1.3 + 1.0 + 4.1 + 2.5 + 3.4) − 4.5 = −2.04 V (D, {G}) = 1/5(3.0 + 4.0 + 4.1 + 2.3 + 1.1) − 3.5 = −0.6 V (E, {G}) = 1/5(1.9 + 2.0 + 2.5 + 2.3 + 2.2) − 2.5 = −0.32 V (F, {G}) = 1/5(2.3 + 3.2 + 3.4 + 1.1 + 2.2) − 4.0 = −1.56 • V(E {G}) • V(E, {G}) 0 C B A E F G D
  • 36. DIANA (8): 2 (1) • V {A,B,C,D,E,F,G} {A,B,C,D,E,F} {G} • V = {G} • V = {A,B,C,D,E,F} V (i, S) 1 |V |−1 j∈V −{i} d(i, j) if S = φ = 1 |V −S|−1 j∈S∪{i} d(i, j) − 1 |S| j∈S d(i, j) if S = φ V (A, {}) = 1/5(1.2 + 1.3 + 3.0 + 1.9 + 2.3) = 1.94 V (B, {}) = 1/5(1.2 + 1.0 + 4.0 + 2.0 + 3.2) = 2.28 V (C, {}) = 1/5(1.3 + 1.0 + 4.1 + 2.5 + 3.4) = 2.46 V (D, {}) = 1/5(3.0 + 4.0 + 4.1 + 2.3 + 1.1) = 2.9 V (E, {}) = 1/5(1.9 + 2.0 + 2.5 + 2.3 + 2.2) = 2.18 V (F, {}) = 1/5(2.3 + 3.2 + 3.4 + 1.1 + 2.2) = 2.44 • S={D} S
  • 37. DIANA (9): 2 (2) • V={A,B,C,D,E,F}, S={D} V(i, S) V (i, S) 1 |V |−1 j∈V −{i} d(i, j) if S = φ = 1 |V −S|−1 j∈S∪{i} d(i, j) − 1 |S| j∈S d(i, j) if S = φ V (A, {D}) = 1/4(1.2 + 1.3 + 1.9 + 2.3) − 3.0 = −1.325 V (B, {D}) = 1/4(1.2 + 1.0 + 2.0 + 3.2) − 4.0 = −2.15 V (C, {D}) = 1/4(1.3 + 1.0 + 2.5 + 3.4) − 4.1 = −2.05 V (E, {D}) = 1/4(1.9 + 2.0 + 2.5 + 2.2) − 2.3 = −0.15 V (F, {D}) = 1/4(2.3 + 3.2 + 3.4 + 2.2) − 1.1 = 1.675 C • S F B • V(F, {D}) 0 S E A F D G
  • 38. DIANA (10): 2 (3) • V={A,B,C,D,E,F}, S={D,F} V (i, S) 1 |V |−1 j∈V −{i} d(i, j) if S = φ = 1 |V −S|−1 j∈S∪{i} d(i, j) − 1 |S| j∈S d(i, j) if S = φ V (A, {D, F}) = 1/3(1.2 + 1.3 + 1.9) − 1/2(3.0 + 2.3) = −1.183 V (B, {D, F}) = 1/3(1.2 + 1.0 + 2.0) − 1/2(4.0 + 3.2) = −2.2 V (C, {D, F}) = 1/3(1.3 + 1.0 + 2.5) − 1/2(4.1 + 3.4) = −2.15 V (E, {D, F}) = 1/3(1.9 + 2.0 + 2.5) − 1/2(2.3 + 2.2) = −0.117 • C B A E F G D G D F E A B C
  • 39. DIANA