SlideShare a Scribd company logo
Datamining 8th Hclustering
4.1                          4

•
    •               1
                        10    20     30
                         0.74 0.76 1.34
                                              40
                                               1.75
                 10 2    2.01 2.62 30 0.87
                         20                      40
                                               0.69
                    3
                 0.74    0.87 0.60 1.34
                         0.76         1.83     1.90
                                                 1.75
                    4    1.73 1.83 0.96        0.93
                 2.01    2.62       0.87         0.69
        4.1                                  10     20     30
              40
                 0.87
                    4
                         0.60       1.83 0      1.90
                 1.73    1.83       0.96        0.93




                                2
(   )
•   2
    •
        •
    •
        •
•
    •       (
        •
    •
        •

                3
•
•
•
•                                                           (Top-down
    Clustering, Divisive Clustering)
                         (Bottom-up Clustering, Agglomerative Clustering)


                  C
              B

                      A


                          F
    G         E
                      D
                              A       B     C   D   E   F   G
        (A)                           (B)
                                  4
(           )
•
•
•

                                                                    1
                                                    C
                                                B

                                                        A
                                            3                           2

                                                            F
A   B     C   D   E       F   G       G         E
                                                        D
         1            2           3
        (C)                           (D)
                                  5
•
•
    •
    •
    •
    •                           2




              X             X
        (A)       6   (B)
•
     d(x, y) :      x, y
                 Ci , Cj
       Dmin (Ci , Cj ) = min {d(x, y) | x ∈ Ci , y ∈ Cj }


1.
2.                                   1
3. 2                       1




                               7
1     C                                         C                             C
          B                                       B                                     B

                        A                             3               A             5               A
              E                                 4         E                                 E
                        2
                                F                                         F                             F

 G                                          G                                  G
                            D                                     D                             D
(A) B,C           D,F               (B)     A,E                               (C)


                                    6
                            5
                  4

      3                                 2
              1
A         B       C         E       D       F       G
(D)                                                           8
(1/2)
       •
4.5.
           •                                                                          59



  (1)           N           x1 , . . . , xN                          1
             x1 , . . . , xN                              C1 , . . . , CN
  (2) n = N              n
  (3)                n=1
      (a)                 C1 , . . . , Cn                                   Ci , Cj
          i<j1             2             3    4       5     6       7
      (b) Ci    Cj                     Ci
      (c)
      (d) Cj = Cn                                                       n=n−1

                        4.8

                                                  9
(1)           N           x1 , . . . , xN                                     1

(2) n = N
           x1 , . . . , xN
                       n
                                                                   C1 , . . . , CN
                                                                                   (2/2)
(3)                n=1
    (a)                 C1 , . . . , Cn                                                Ci , Cj
        i<j
    (b) Ci    Cj                     Ci
    (c)
    (d) Cj = Cn                                                                    n=n−1

                   1    4.8   2       3           4            5     6         7

        (a)

                    1         2       3           4            5     6         7

        (b)
              1    C                                   C                               C
          B         1         2       3
                                      B           4            5     6         7
                                                                               B
        (d)
                        A                 3                A               5               A
              E                               E   10                               E
(A)                                      (B)

                                    A        B,C   D,F       E     G
                        A                    1.2   2.3       1.9   4.1       B,C     1.2

                        B,C         1.2            3.2       2.0   4.0       A       1.2
                        D,F         2.3      3.2             2.2   3.5       E       2.2
            C
                        E           1.9      2.0   2.2             2.5       A       1.9
    B
                        G           4.1      4.0   3.5       2.5             E       2.5
                A
        E                                     A B,C
                    F
                            (C)                                        (D)
G
            D                       A,B,C          D,F       E     G
                        A,B,C                      2.3       1.9   4.0       E       1.9
                        D,F         2.3                      2.2   3.5       E       2.2
                        E           1.9            2.2             2.5       A,B,C   1.9

                        G           4.0            3.5       2.5             E       2.5


                                        11               ※
•
    •   Ci    Cj        Ci’
    •   Ci’             Ck
        •    Ci    Ck             Cj   Ck
    •
        •                     N                  O(N)
        •                     O(N^2)
•
    •                             Cj              Ci’
•                   1                              O(N)     O(N^2)


•                       N-1                                         O(N)
                                                 O(N^2),   O(N^2)
                                            12
35, 48)




•(Kruskal’s Algorithm)                     (Minimum Spanning Tree)
•
       4.2                      G = (V, E)(V                     E                )
 T ⊆G             G                    T                 T
                       G                   V    T                                 T
     (Spanning Tree)                                                                  G
     (u, v) ∈ E            w(u, v)                                                G
                                      (u,v)∈T   w(u, v)                               T
 G                    (Minimum Spanning Tree)

       4.13(A)                                                       4.9(P. 59)



                                               4.13(B)           4.13(B)
        G                                        1              BC                    6
             GE                                     13       4.9(D) P. 59
G




         w(u,v)   u,v




    14
Kruscal(
 72                                                           )               4
  (1)                                                         G = (V, E) V                         E

  (2) A                              A
  (3) V                                                                   (                1
                    )
  (4)                                                      (                      (u, v) ∈ E           )
            (a)                          A       A ∪ {(u, v)}                      u v

            (b) u        v

  (5)                                                                 A

                             C                                    C                                        C
                  B                                   B                                   B
                  4.14                                                    A

                             A                                    A                                        A
                   E                                      E                                    E
                             C                    e                   V-C
                                 F                                    F                                        F

        G                                    G                                      G
                             D                                    D                                        D
                 A={}                         A={(B,C),(D,F)}                     A={(B,C),(D,F),(A,B)}
      {A}, {B}, {C}, {D}, {E}, {F}           {A}, {B,C}, {D,F}, {E}
                                               e’                                  {A, B,C}, {D,F}, {E}
                                                      15
C                                               C
          B
                                                           1
                                                       B

                          A
                                                               3
                                                                       A
              E                                            E       4
                              F
                                                       6           5           F
                                                                           2
      G                                       G
                          D                                            D
(A)                                       (B) (A)




                                      6
                                  5
                              4
                  3                       2
                          1
              A       B       C   E   D       F        G
                      (C)



                                                  16
C                                     C                           C
          B                                 B                                  B
                      A                                     A                           A
        E                                   E                                  E
                              F                                 F                           F

 G                                  G                                G
                      D                                 D                               D
(A)                               (B)               A                (C)                    ( AB)
                                    E       A               B,C            T                B

                  C                                     C
       B                                B
                      A
                                                            A
      E                                         E                                  Q

                          F                                     F
                                                                                             T
G                                  G
                  D                                     D
(D)   C, F    T                   (E)
                                                                      O(E + V log V )

                                                                17
X

•
     d(x, y) :  x, y
            Ci , Cj
        Dmax (Ci , Cj ) = max{d(x, y)|x ∈ Ci , y ∈ Cj }

1.
2.                                    1
3. 2                    1
                                18
1                                           3        C                                    C
                         C
           B                                          B                                     B

                         A                                              A                                    A
               E                                                                        5
                         2                             4
                                                                                                E
                                 F                         E                F                                    F
                                                                                        G
 G                                           G
                             D                                      D                                    D
(A) B,C            D,F                           (B)       A,E                    (C)       G


                                                  5                     (A) (C)             (1      5)
                                                                        (D)
                                                                        1 5
                                         4
       3
                                     2
               1

A          B       C         D           F       E         G
 (D)


                                                               19
(A)                            (B)
            A    B,C D,F     E     G
A                1.3 3.0     1.9   4.1          B,C   1.3
B,C        1.3         4.1   2.5   4.5          A     1.3
D,F        3.0   4.1         2.3   4.0          E     2.3
E          1.9   2.5 2.3           2.5          A     1.9
G          4.1   4.5 4.0     2.5                E     2.5
            A B,C
    (C)                                   (D)
           A,B,C       D,F   E     G
A,B,C                  4.1   2.5   4.5          E     2.5
D,F        4.1               2.3   4.0          E     2.3
E          2.5         2.3         2.5          D,F   2.3
G          4.5         4.0   2.5                E     2.5   20
C
                C
        B                                             B

                A                                                          A
            E                                             E

                        F                                                      F

    G                                        G
                                                                       D
                    D
(A) E                         A      (B) A B,C                                 E
                                                          A


                C
        B
                            (C) A B,C                              E
                    A          D,F           CE               DE
                                                  E
            E           F

    G
                D

                            O(N^2)                                                 O(N^3)
                                        21
{A},1.9
                                                                O(N^2 log N)
    {B,C}, 2.5             {D,F}, 2.3            (                             O(log N) )

{G}, 2.5
    (A)      E

                                             {A},1.9            {B,C}, 2.5

                                           (B)              2
                  {D,F}, 2.3


           {G}, 2.5
           (C)

                                                     {A, B,C}, 2.5
                                           (D)
                 {D,F}, 2.3


          {G}, 2.5         {A, B,C}, 2.5

           (E)
                                             22
duces the clusters shown in Figure 12,
                                                          whereas the complete-link algorithm ob-
                                                          tains the clustering shown in Figure 13.
                                                                         Data Clustering         •      277
S                                                         The clusters obtained by the complete-
i
X
m2                                                        link algorithm are more compact than
                                                             X2
i                                                         those obtained by the single-link algo-
l
a                                                         rithm; the cluster labeled 1 obtained
r                                      2              2   using the 1single-link algorithm 2is elon-
                                                                     1 11                                 2
i        1 111                    2 2 22   2                                                2 2 22 2
t          1 1 11                     2              2    gated because 1of the noisy patterns la-
                                                                       1 1 1                    2
                                                                                                        2
                                                                                                         2
                                   2                2                                         2
y        11
             1 1 1 *** * * * * **    2     2       2      beled “*”. 1 1 1 1 single-link * algorithm 2is
                                                                     11
                                                                         The * * * * * * * * 2      2
          1 1                      2       2                          1                      2      2
            1 1 1 1
                 1                             2          more versatile 1than the complete-link
                                                                      1
                                                                        1 1 1
                                                                              1                     2
                                                                                                      2
          1                                2
                                                          algorithm, otherwise. For example, the
                                                          single-link algorithm can extract the
                                                          concentric clusters shown in Figure 11,
             A       B   C   D       E   F***G
                                                          but the complete-link algorithm cannot.
Figure 10. The dendrogram obtained using X1               However, from a pragmatic viewpoint, it 1         X
the single-link algorithm.
 Figure 12. A single-link clustering of a pattern         has been observed that clustering of a pat-
                                                             Figure 13. A complete-link the complete-
 set containing two classes (1 and 2) connected by           tern set containing two classes (1 and 2) con-
                                                          link algorithm produces more useful hi-
Y
 a chain of noisy patterns (*).                           erarchies inchain of noisy patterns (*).
                                                             nected by a
                                                                           many applications than the
                         1                                single-link algorithm [Jain and Dubes
                                 1                        1988].
                 1
(3) The output of the algorithm is a                        well-separated, chain-like, and concen-
                   2
    nested hierarchy of graphs which
                        2       1                           tric clusters, whereas a typicalClus-
                                                              Agglomerative Single-Link              parti-
                      2
    can be cut at a desired dissimilarity
           1    2          2                              tering Algorithm such as the k -means
                                                            tional algorithm
    level forming a partition (clustering)
                                1
                                                            algorithm works well only on data sets
                      2                                   (1) Place isotropic clustersits own clus-
                                                            having each pattern in [Nagy 1968].
    identified by simply connected com-
             1
    ponents in the 1corresponding graph.
                           1                                On theConstruct a list of interpattern
                                                                ter. other hand, the time and space
                                                            complexities for all 1992] ofunordered
                                                                distances [Day distinct the parti-
Agglomerative Complete-Link Clus-                          23
                                                            tional algorithms are typically lower
                             X                                  pairs of patterns, and sort this list
•
  •   Average Group Linkage
      •                      1
           D(Ci , Cj ) =                              D(x1 , x2 )
                         |Ci ||Cj |
                                      x1 ∈Ci x2 ∈Cj

  •   Ward’s Method
      •
           D(Ci , Cj ) = E(Ci ∪ Cj ) − E(Ci ) − E(Cj )
                      where E(Ci ) =  (d(x, ci ))2 ,
                                          x∈Ci
                                   1
                             ci =                x
                                  |Ci |
                                          x∈Ci




Average Group Linkage             24
                                               Ward’s Method
Datamining 8th Hclustering
Datamining 8th Hclustering
(1)
•
    •
    •
•   n
    •             n*(n-1)


        •
    •       2^n
        •
(2)
•
    •
    •
    •
•
•   DIANA       (DIvisive ANAlysis )
    •   DIANA


        •
DIANA (1)
       •                                      V(i,S)

V
S(⊂V):
d(i, j) :     i    j
S                 i∈V-S                                       V(i,S)

V (i, S)
               1
            |V |−1   j∈V −{i} d(i, j)                                   if S = φ
=                1
            |V −S|−1    j∈S∪{i} d(i, j)   −    1
                                              |S|      j∈S    d(i, j)   if S = φ


    V(i,S)             i   S                        - (V-S)
DIANA (2):
                      C               A
                B                 B   1.2     B

                      A           C   1.3     1.0   C
                  E               D   3.0     4.0   4.1   D
                              F   E   1.9     2.0   2.5   2.3     E

           G                      F   2.3     3.2   3.4   1.1     2.2   F
                          D       G   4.1     4.0   4.5   3.5     2.5   4.0
         (A)                                (B)

6                                                             4
(1)        : S       {}      S
(2) V (i, S)          i∈V −S
(3) V (i, S) > 0    i S               (2)
(4) V (i, S) ≤ 0    S     i                                                   (5)
(5)           V V −S


               4.24
DIANA (3): 1                                                   (1)
             C               A                                                C
        B                B   1.2 B                                     B

              A          C 1.3 1.0 C                                           A
         E               D 3.0 4.0 4.1 D                               E

                     F   E   1.9 2.0 2.5 2.3    E                                     F
                         F   2.3 3.2 3.4 1.1    2.2   F        G
    G
                 D       G 4.1 4.0 4.5 3.5      2.5   4.0                         D

  (A)                            (B)                         (C)
                                                                   G
  V (i, S)
           1
        |V |−1   j∈V −{i} d(i, j)                                          if S = φ
  =          1
        |V −S|−1    j∈S∪{i} d(i, j)      −      1
                                               |S|    j∈S   d(i, j)        if S = φ

V (G, {})
  = 1/6(d(G, A) + d(G, B) + d(G, C) + d(G, D) + d(G, E) + d(G, F))
  = 1/6(4.1 + 4.0 + 4.5 + 3.5 + 2.5 + 4.0) = 3.77
DIANA (4): 1                                                       (2)
            A
                                                                  C
        B   1.2 B                                           B
        C 1.3 1.0 C
                                                                    A
        D 3.0 4.0 4.1 D                                       E
        E   1.9 2.0 2.5 2.3   E                                           F
        F   2.3 3.2 3.4 1.1   2.2   F
                                                    G
        G 4.1 4.0 4.5 3.5     2.5   4.0                               D
                (B)                             (D)
                                                        E
 V (i, S)
          1
       |V |−1   j∈V −{i} d(i, j)                                              if S = φ
 =          1
       |V −S|−1    j∈S∪{i} d(i, j)        −    1
                                              |S|       j∈S   d(i, j)         if S = φ

V (E, {})
   = 1/6(d(E, A) + d(E, B) + d(E, C) + d(E, D) + d(E, F) + d(E, G))
   = 1/6(1.9 + 2.0 + 2.5 + 2.3 + 2.2 + 2.5) = 2.23
DIANA (5): 1                              (3)
  •   V(i, {})

V (A, {}) = 13.8/6 = 2.3, V (B, {}) = 15.4/6 = 2.57,
V (C, {}) = 16.8/6 = 2.8, V (D, {}) = 18.0/6 = 3.0,
V (E, {}) = 2.23, V (F, {}) = 16.2/6 = 2.7
V (G, {}) = 3.77

  •                            V(G, {})
  •   V(G, {}) > 0       G                   S
      •   S = {G}
  •                  S
DIANA (6): 2                                              (1)
                C               A
            B               B   1.2 B

                A           C 1.3 1.0 C
           E                D 3.0 4.0 4.1 D
                    F       E   1.9 2.0 2.5 2.3     E

                 D          F   2.3 3.2 3.4 1.1     2.2   F
     G
                            G 4.1 4.0 4.5 3.5       2.5   4.0

V (i, S)
         1
      |V |−1   j∈V −{i} d(i, j)                                  if S = φ
=          1
      |V −S|−1    j∈S∪{i} d(i, j)   −    1
                                        |S|   j∈S   d(i, j)      if S = φ
V (A, {G})
   = 1/5(d(A, B) + d(A, C)+
       d(A, D) + d(A, E) + d(A, F)) − 1/1(d(A, G))
   = 1/5(1.2 + 1.3 + 3.0 + 1.9 + 2.3) − 4.1 = −2.16
DIANA (7): 2                                    (2)
      •
V (B, {G})   =   1/5(1.2 + 1.0 + 4.0 + 2.0 + 3.2) − 4.0 = −1.72
V (C, {G})   =   1/5(1.3 + 1.0 + 4.1 + 2.5 + 3.4) − 4.5 = −2.04
V (D, {G})   =   1/5(3.0 + 4.0 + 4.1 + 2.3 + 1.1) − 3.5 = −0.6
V (E, {G})   =   1/5(1.9 + 2.0 + 2.5 + 2.3 + 2.2) − 2.5 = −0.32
V (F, {G})   =   1/5(2.3 + 3.2 + 3.4 + 1.1 + 2.2) − 4.0 = −1.56
      •                               V(D, {G})
      •           V(D, {G}) < 0
                                  C
                          B

                                  A
                         E
                                      F

                   G              D
DIANA (8): 2                                               (1)
•   V   {A,B,C,D,E,F,G} {A,B,C,D,E,F}        {G}
•   V = {G}
•   V = {A,B,C,D,E,F}
V (i, S)
           1
        |V |−1   j∈V −{i} d(i, j)                                  if S = φ
=            1
        |V −S|−1    j∈S∪{i} d(i, j)     −    1
                                            |S|    j∈S   d(i, j)   if S = φ

    V (A, {})   =       1/5(1.2 + 1.3 + 3.0 + 1.9 + 2.3) = 1.94
    V (B, {})   =       1/5(1.2 + 1.0 + 4.0 + 2.0 + 3.2) = 2.28
    V (C, {})   =       1/5(1.3 + 1.0 + 4.1 + 2.5 + 3.4) = 2.46
    V (D, {})   =       1/5(3.0 + 4.0 + 4.1 + 2.3 + 1.1) = 2.9
    V (E, {})   =       1/5(1.9 + 2.0 + 2.5 + 2.3 + 2.2) = 2.18
    V (F, {})   =       1/5(2.3 + 3.2 + 3.4 + 1.1 + 2.2) = 2.44
    •                           S={D}                      S
DIANA (9): 2                                              (2)
 •   V={A,B,C,D,E,F}, S={D}              V(i, S)
V (i, S)
            1
         |V |−1   j∈V −{i} d(i, j)                                 if S = φ
=             1
         |V −S|−1    j∈S∪{i} d(i, j)   −    1
                                           |S|     j∈S   d(i, j)   if S = φ
V (A, {D})      =    1/4(1.2 + 1.3 + 1.9 + 2.3) − 3.0 = −1.325
V (B, {D})      =    1/4(1.2 + 1.0 + 2.0 + 3.2) − 4.0 = −2.15
V (C, {D})      =    1/4(1.3 + 1.0 + 2.5 + 3.4) − 4.1 = −2.05
V (E, {D})      =    1/4(1.9 + 2.0 + 2.5 + 2.2) − 2.3 = −0.15
V (F, {D})      =    1/4(2.3 + 3.2 + 3.4 + 2.2) − 1.1 = 1.675
                                                                                 C
     •   S                       F                                       B

     •   V(F, {D}) > 0               S
                                                                        E
                                                                                 A

                                                                                     F
                                                                             D
                                                                   G
DIANA (9): 2                                                       (3)
       • V={A,B,C,D,E,F}, S={D,F}
 V (i, S)
         1
      |V |−1   j∈V −{i} d(i, j)                                    if S = φ
 =         1
      |V −S|−1    j∈S∪{i} d(i, j)    −    1
                                         |S|   j∈S   d(i, j)       if S = φ
V (A, {D, F})    =       1/3(1.2 + 1.3 + 1.9) − 1/2(3.0 + 2.3) = −1.183
V (B, {D, F})    =       1/3(1.2 + 1.0 + 2.0) − 1/2(4.0 + 3.2) = −2.2
V (C, {D, F})    =       1/3(1.3 + 1.0 + 2.5) − 1/2(4.1 + 3.4) = −2.15
V (E, {D, F})    =       1/3(1.9 + 2.0 + 2.5) − 1/2(2.3 + 2.2) = −0.117
     •
                     C
             B

                     A
            E
                          F

     G
                     D                         G       D       F     E    A    B   C
DIANA

More Related Content

PDF
2011 10 lyp_mathematics_sa1_15 (1)
DOCX
Binomial expansion+teorem pascal+mapping
PDF
09 trial kedah_s1
PDF
Summer Task - MATHS - Yr 12 preparation
PDF
Multiple choice one
PDF
Pmr trial-2010-math-qa-perak
DOC
Topic 2 fractions
PDF
2011 10 lyp_mathematics_sa1_15 (1)
Binomial expansion+teorem pascal+mapping
09 trial kedah_s1
Summer Task - MATHS - Yr 12 preparation
Multiple choice one
Pmr trial-2010-math-qa-perak
Topic 2 fractions

What's hot (17)

PDF
June 2006
PDF
cxc June 2010 math
PDF
Iitjee2011paper1
PDF
January 2008
PDF
Dependable direct solutions for linear systems using a little extra precision
PDF
CXC -maths-2009-p1
PDF
June 2009
PDF
K-means clustering exercise based on eucalidean distance
PDF
January 2012
PDF
June 2008
PDF
Advanced gmat mathquestions-version2
PDF
CXC MATHEMATICS MULTIPLE CHOICE
PDF
cxc.Mathsexam1
PDF
Algebra1
PDF
In-Database Predictive Analytics
PDF
January 2010
PDF
formulas calculo integral y diferencial
June 2006
cxc June 2010 math
Iitjee2011paper1
January 2008
Dependable direct solutions for linear systems using a little extra precision
CXC -maths-2009-p1
June 2009
K-means clustering exercise based on eucalidean distance
January 2012
June 2008
Advanced gmat mathquestions-version2
CXC MATHEMATICS MULTIPLE CHOICE
cxc.Mathsexam1
Algebra1
In-Database Predictive Analytics
January 2010
formulas calculo integral y diferencial
Ad

Viewers also liked (7)

PDF
080808
PPT
Vietnam K1664 Brand Experience Events 30 Nov 07 [2003]
PDF
Patterns, Wiki, XP, and Ruby
PPT
NHTP Horizon Campaign
PPT
Review Packet Solutions
PDF
Building a strong team
PPT
CHARLA REDES SOCIALES RETROSPECTIVA Y TENDENCIAS
080808
Vietnam K1664 Brand Experience Events 30 Nov 07 [2003]
Patterns, Wiki, XP, and Ruby
NHTP Horizon Campaign
Review Packet Solutions
Building a strong team
CHARLA REDES SOCIALES RETROSPECTIVA Y TENDENCIAS
Ad

Similar to Datamining 8th Hclustering (20)

PDF
เซต
DOCX
ข้อสอบ Gsp
PDF
Datamining 7th kmeans
PDF
9702 w11 ms_11
PDF
Datamining 7th Kmeans
PPS
Data envelopment analysis
KEY
Taocp 2.3
PDF
9702 w11 ms_13
PDF
Oel Cutoffs 2009 Coded
PDF
PPTX
Video Final 2
PPTX
شبكات الأعمال
PDF
9702 w11 ms_12
PDF
Year 11 interim 1 League Tables 2011
PPTX
Políticas_de_planificación
PDF
Large-Scale Graph Processing〜Introduction〜(完全版)
XLS
Artışsal atama örneği (12. Hafta)
DOCX
Kunci jawaban paket a & paket b
PDF
5090 s09 ms_1
เซต
ข้อสอบ Gsp
Datamining 7th kmeans
9702 w11 ms_11
Datamining 7th Kmeans
Data envelopment analysis
Taocp 2.3
9702 w11 ms_13
Oel Cutoffs 2009 Coded
Video Final 2
شبكات الأعمال
9702 w11 ms_12
Year 11 interim 1 League Tables 2011
Políticas_de_planificación
Large-Scale Graph Processing〜Introduction〜(完全版)
Artışsal atama örneği (12. Hafta)
Kunci jawaban paket a & paket b
5090 s09 ms_1

More from sesejun (20)

PDF
RNAseqによる変動遺伝子抽出の統計: A Review
PDF
バイオインフォマティクスによる遺伝子発現解析
PDF
次世代シーケンサが求める機械学習
PDF
20110602labseminar pub
PDF
20110524zurichngs 2nd pub
PDF
20110524zurichngs 1st pub
PDF
20110214nips2010 read
PDF
Datamining 9th association_rule.key
PDF
Datamining 8th hclustering
PDF
Datamining r 4th
PDF
Datamining r 3rd
PDF
Datamining r 2nd
PDF
Datamining r 1st
PDF
Datamining 6th svm
PDF
Datamining 5th knn
PDF
Datamining 4th adaboost
PDF
Datamining 3rd naivebayes
PDF
Datamining 2nd decisiontree
PDF
100401 Bioinfoinfra
PDF
Datamining 9th Association Rule
RNAseqによる変動遺伝子抽出の統計: A Review
バイオインフォマティクスによる遺伝子発現解析
次世代シーケンサが求める機械学習
20110602labseminar pub
20110524zurichngs 2nd pub
20110524zurichngs 1st pub
20110214nips2010 read
Datamining 9th association_rule.key
Datamining 8th hclustering
Datamining r 4th
Datamining r 3rd
Datamining r 2nd
Datamining r 1st
Datamining 6th svm
Datamining 5th knn
Datamining 4th adaboost
Datamining 3rd naivebayes
Datamining 2nd decisiontree
100401 Bioinfoinfra
Datamining 9th Association Rule

Datamining 8th Hclustering

  • 2. 4.1 4 • • 1 10 20 30 0.74 0.76 1.34 40 1.75 10 2 2.01 2.62 30 0.87 20 40 0.69 3 0.74 0.87 0.60 1.34 0.76 1.83 1.90 1.75 4 1.73 1.83 0.96 0.93 2.01 2.62 0.87 0.69 4.1 10 20 30 40 0.87 4 0.60 1.83 0 1.90 1.73 1.83 0.96 0.93 2
  • 3. ( ) • 2 • • • • • • ( • • • 3
  • 4. • • • • (Top-down Clustering, Divisive Clustering) (Bottom-up Clustering, Agglomerative Clustering) C B A F G E D A B C D E F G (A) (B) 4
  • 5. ( ) • • • 1 C B A 3 2 F A B C D E F G G E D 1 2 3 (C) (D) 5
  • 6. • • • • • • 2 X X (A) 6 (B)
  • 7. d(x, y) : x, y Ci , Cj Dmin (Ci , Cj ) = min {d(x, y) | x ∈ Ci , y ∈ Cj } 1. 2. 1 3. 2 1 7
  • 8. 1 C C C B B B A 3 A 5 A E 4 E E 2 F F F G G G D D D (A) B,C D,F (B) A,E (C) 6 5 4 3 2 1 A B C E D F G (D) 8
  • 9. (1/2) • 4.5. • 59 (1) N x1 , . . . , xN 1 x1 , . . . , xN C1 , . . . , CN (2) n = N n (3) n=1 (a) C1 , . . . , Cn Ci , Cj i<j1 2 3 4 5 6 7 (b) Ci Cj Ci (c) (d) Cj = Cn n=n−1 4.8 9
  • 10. (1) N x1 , . . . , xN 1 (2) n = N x1 , . . . , xN n C1 , . . . , CN (2/2) (3) n=1 (a) C1 , . . . , Cn Ci , Cj i<j (b) Ci Cj Ci (c) (d) Cj = Cn n=n−1 1 4.8 2 3 4 5 6 7 (a) 1 2 3 4 5 6 7 (b) 1 C C C B 1 2 3 B 4 5 6 7 B (d) A 3 A 5 A E E 10 E
  • 11. (A) (B) A B,C D,F E G A 1.2 2.3 1.9 4.1 B,C 1.2 B,C 1.2 3.2 2.0 4.0 A 1.2 D,F 2.3 3.2 2.2 3.5 E 2.2 C E 1.9 2.0 2.2 2.5 A 1.9 B G 4.1 4.0 3.5 2.5 E 2.5 A E A B,C F (C) (D) G D A,B,C D,F E G A,B,C 2.3 1.9 4.0 E 1.9 D,F 2.3 2.2 3.5 E 2.2 E 1.9 2.2 2.5 A,B,C 1.9 G 4.0 3.5 2.5 E 2.5 11 ※
  • 12. • Ci Cj Ci’ • Ci’ Ck • Ci Ck Cj Ck • • N O(N) • O(N^2) • • Cj Ci’ • 1 O(N) O(N^2) • N-1 O(N) O(N^2), O(N^2) 12
  • 13. 35, 48) •(Kruskal’s Algorithm) (Minimum Spanning Tree) • 4.2 G = (V, E)(V E ) T ⊆G G T T G V T T (Spanning Tree) G (u, v) ∈ E w(u, v) G (u,v)∈T w(u, v) T G (Minimum Spanning Tree) 4.13(A) 4.9(P. 59) 4.13(B) 4.13(B) G 1 BC 6 GE 13 4.9(D) P. 59
  • 14. G w(u,v) u,v 14
  • 15. Kruscal( 72 ) 4 (1) G = (V, E) V E (2) A A (3) V ( 1 ) (4) ( (u, v) ∈ E ) (a) A A ∪ {(u, v)} u v (b) u v (5) A C C C B B B 4.14 A A A A E E E C e V-C F F F G G G D D D A={} A={(B,C),(D,F)} A={(B,C),(D,F),(A,B)} {A}, {B}, {C}, {D}, {E}, {F} {A}, {B,C}, {D,F}, {E} e’ {A, B,C}, {D,F}, {E} 15
  • 16. C C B 1 B A 3 A E E 4 F 6 5 F 2 G G D D (A) (B) (A) 6 5 4 3 2 1 A B C E D F G (C) 16
  • 17. C C C B B B A A A E E E F F F G G G D D D (A) (B) A (C) ( AB) E A B,C T B C C B B A A E E Q F F T G G D D (D) C, F T (E) O(E + V log V ) 17
  • 18. X • d(x, y) : x, y Ci , Cj Dmax (Ci , Cj ) = max{d(x, y)|x ∈ Ci , y ∈ Cj } 1. 2. 1 3. 2 1 18
  • 19. 1 3 C C C B B B A A A E 5 2 4 E F E F F G G G D D D (A) B,C D,F (B) A,E (C) G 5 (A) (C) (1 5) (D) 1 5 4 3 2 1 A B C D F E G (D) 19
  • 20. (A) (B) A B,C D,F E G A 1.3 3.0 1.9 4.1 B,C 1.3 B,C 1.3 4.1 2.5 4.5 A 1.3 D,F 3.0 4.1 2.3 4.0 E 2.3 E 1.9 2.5 2.3 2.5 A 1.9 G 4.1 4.5 4.0 2.5 E 2.5 A B,C (C) (D) A,B,C D,F E G A,B,C 4.1 2.5 4.5 E 2.5 D,F 4.1 2.3 4.0 E 2.3 E 2.5 2.3 2.5 D,F 2.3 G 4.5 4.0 2.5 E 2.5 20
  • 21. C C B B A A E E F F G G D D (A) E A (B) A B,C E A C B (C) A B,C E A D,F CE DE E E F G D O(N^2) O(N^3) 21
  • 22. {A},1.9 O(N^2 log N) {B,C}, 2.5 {D,F}, 2.3 ( O(log N) ) {G}, 2.5 (A) E {A},1.9 {B,C}, 2.5 (B) 2 {D,F}, 2.3 {G}, 2.5 (C) {A, B,C}, 2.5 (D) {D,F}, 2.3 {G}, 2.5 {A, B,C}, 2.5 (E) 22
  • 23. duces the clusters shown in Figure 12, whereas the complete-link algorithm ob- tains the clustering shown in Figure 13. Data Clustering • 277 S The clusters obtained by the complete- i X m2 link algorithm are more compact than X2 i those obtained by the single-link algo- l a rithm; the cluster labeled 1 obtained r 2 2 using the 1single-link algorithm 2is elon- 1 11 2 i 1 111 2 2 22 2 2 2 22 2 t 1 1 11 2 2 gated because 1of the noisy patterns la- 1 1 1 2 2 2 2 2 2 y 11 1 1 1 *** * * * * ** 2 2 2 beled “*”. 1 1 1 1 single-link * algorithm 2is 11 The * * * * * * * * 2 2 1 1 2 2 1 2 2 1 1 1 1 1 2 more versatile 1than the complete-link 1 1 1 1 1 2 2 1 2 algorithm, otherwise. For example, the single-link algorithm can extract the concentric clusters shown in Figure 11, A B C D E F***G but the complete-link algorithm cannot. Figure 10. The dendrogram obtained using X1 However, from a pragmatic viewpoint, it 1 X the single-link algorithm. Figure 12. A single-link clustering of a pattern has been observed that clustering of a pat- Figure 13. A complete-link the complete- set containing two classes (1 and 2) connected by tern set containing two classes (1 and 2) con- link algorithm produces more useful hi- Y a chain of noisy patterns (*). erarchies inchain of noisy patterns (*). nected by a many applications than the 1 single-link algorithm [Jain and Dubes 1 1988]. 1 (3) The output of the algorithm is a well-separated, chain-like, and concen- 2 nested hierarchy of graphs which 2 1 tric clusters, whereas a typicalClus- Agglomerative Single-Link parti- 2 can be cut at a desired dissimilarity 1 2 2 tering Algorithm such as the k -means tional algorithm level forming a partition (clustering) 1 algorithm works well only on data sets 2 (1) Place isotropic clustersits own clus- having each pattern in [Nagy 1968]. identified by simply connected com- 1 ponents in the 1corresponding graph. 1 On theConstruct a list of interpattern ter. other hand, the time and space complexities for all 1992] ofunordered distances [Day distinct the parti- Agglomerative Complete-Link Clus- 23 tional algorithms are typically lower X pairs of patterns, and sort this list
  • 24. • • Average Group Linkage • 1 D(Ci , Cj ) = D(x1 , x2 ) |Ci ||Cj | x1 ∈Ci x2 ∈Cj • Ward’s Method • D(Ci , Cj ) = E(Ci ∪ Cj ) − E(Ci ) − E(Cj ) where E(Ci ) = (d(x, ci ))2 , x∈Ci 1 ci = x |Ci | x∈Ci Average Group Linkage 24 Ward’s Method
  • 27. (1) • • • • n • n*(n-1) • • 2^n •
  • 28. (2) • • • • • • DIANA (DIvisive ANAlysis ) • DIANA •
  • 29. DIANA (1) • V(i,S) V S(⊂V): d(i, j) : i j S i∈V-S V(i,S) V (i, S) 1 |V |−1 j∈V −{i} d(i, j) if S = φ = 1 |V −S|−1 j∈S∪{i} d(i, j) − 1 |S| j∈S d(i, j) if S = φ V(i,S) i S - (V-S)
  • 30. DIANA (2): C A B B 1.2 B A C 1.3 1.0 C E D 3.0 4.0 4.1 D F E 1.9 2.0 2.5 2.3 E G F 2.3 3.2 3.4 1.1 2.2 F D G 4.1 4.0 4.5 3.5 2.5 4.0 (A) (B) 6 4 (1) : S {} S (2) V (i, S) i∈V −S (3) V (i, S) > 0 i S (2) (4) V (i, S) ≤ 0 S i (5) (5) V V −S 4.24
  • 31. DIANA (3): 1 (1) C A C B B 1.2 B B A C 1.3 1.0 C A E D 3.0 4.0 4.1 D E F E 1.9 2.0 2.5 2.3 E F F 2.3 3.2 3.4 1.1 2.2 F G G D G 4.1 4.0 4.5 3.5 2.5 4.0 D (A) (B) (C) G V (i, S) 1 |V |−1 j∈V −{i} d(i, j) if S = φ = 1 |V −S|−1 j∈S∪{i} d(i, j) − 1 |S| j∈S d(i, j) if S = φ V (G, {}) = 1/6(d(G, A) + d(G, B) + d(G, C) + d(G, D) + d(G, E) + d(G, F)) = 1/6(4.1 + 4.0 + 4.5 + 3.5 + 2.5 + 4.0) = 3.77
  • 32. DIANA (4): 1 (2) A C B 1.2 B B C 1.3 1.0 C A D 3.0 4.0 4.1 D E E 1.9 2.0 2.5 2.3 E F F 2.3 3.2 3.4 1.1 2.2 F G G 4.1 4.0 4.5 3.5 2.5 4.0 D (B) (D) E V (i, S) 1 |V |−1 j∈V −{i} d(i, j) if S = φ = 1 |V −S|−1 j∈S∪{i} d(i, j) − 1 |S| j∈S d(i, j) if S = φ V (E, {}) = 1/6(d(E, A) + d(E, B) + d(E, C) + d(E, D) + d(E, F) + d(E, G)) = 1/6(1.9 + 2.0 + 2.5 + 2.3 + 2.2 + 2.5) = 2.23
  • 33. DIANA (5): 1 (3) • V(i, {}) V (A, {}) = 13.8/6 = 2.3, V (B, {}) = 15.4/6 = 2.57, V (C, {}) = 16.8/6 = 2.8, V (D, {}) = 18.0/6 = 3.0, V (E, {}) = 2.23, V (F, {}) = 16.2/6 = 2.7 V (G, {}) = 3.77 • V(G, {}) • V(G, {}) > 0 G S • S = {G} • S
  • 34. DIANA (6): 2 (1) C A B B 1.2 B A C 1.3 1.0 C E D 3.0 4.0 4.1 D F E 1.9 2.0 2.5 2.3 E D F 2.3 3.2 3.4 1.1 2.2 F G G 4.1 4.0 4.5 3.5 2.5 4.0 V (i, S) 1 |V |−1 j∈V −{i} d(i, j) if S = φ = 1 |V −S|−1 j∈S∪{i} d(i, j) − 1 |S| j∈S d(i, j) if S = φ V (A, {G}) = 1/5(d(A, B) + d(A, C)+ d(A, D) + d(A, E) + d(A, F)) − 1/1(d(A, G)) = 1/5(1.2 + 1.3 + 3.0 + 1.9 + 2.3) − 4.1 = −2.16
  • 35. DIANA (7): 2 (2) • V (B, {G}) = 1/5(1.2 + 1.0 + 4.0 + 2.0 + 3.2) − 4.0 = −1.72 V (C, {G}) = 1/5(1.3 + 1.0 + 4.1 + 2.5 + 3.4) − 4.5 = −2.04 V (D, {G}) = 1/5(3.0 + 4.0 + 4.1 + 2.3 + 1.1) − 3.5 = −0.6 V (E, {G}) = 1/5(1.9 + 2.0 + 2.5 + 2.3 + 2.2) − 2.5 = −0.32 V (F, {G}) = 1/5(2.3 + 3.2 + 3.4 + 1.1 + 2.2) − 4.0 = −1.56 • V(D, {G}) • V(D, {G}) < 0 C B A E F G D
  • 36. DIANA (8): 2 (1) • V {A,B,C,D,E,F,G} {A,B,C,D,E,F} {G} • V = {G} • V = {A,B,C,D,E,F} V (i, S) 1 |V |−1 j∈V −{i} d(i, j) if S = φ = 1 |V −S|−1 j∈S∪{i} d(i, j) − 1 |S| j∈S d(i, j) if S = φ V (A, {}) = 1/5(1.2 + 1.3 + 3.0 + 1.9 + 2.3) = 1.94 V (B, {}) = 1/5(1.2 + 1.0 + 4.0 + 2.0 + 3.2) = 2.28 V (C, {}) = 1/5(1.3 + 1.0 + 4.1 + 2.5 + 3.4) = 2.46 V (D, {}) = 1/5(3.0 + 4.0 + 4.1 + 2.3 + 1.1) = 2.9 V (E, {}) = 1/5(1.9 + 2.0 + 2.5 + 2.3 + 2.2) = 2.18 V (F, {}) = 1/5(2.3 + 3.2 + 3.4 + 1.1 + 2.2) = 2.44 • S={D} S
  • 37. DIANA (9): 2 (2) • V={A,B,C,D,E,F}, S={D} V(i, S) V (i, S) 1 |V |−1 j∈V −{i} d(i, j) if S = φ = 1 |V −S|−1 j∈S∪{i} d(i, j) − 1 |S| j∈S d(i, j) if S = φ V (A, {D}) = 1/4(1.2 + 1.3 + 1.9 + 2.3) − 3.0 = −1.325 V (B, {D}) = 1/4(1.2 + 1.0 + 2.0 + 3.2) − 4.0 = −2.15 V (C, {D}) = 1/4(1.3 + 1.0 + 2.5 + 3.4) − 4.1 = −2.05 V (E, {D}) = 1/4(1.9 + 2.0 + 2.5 + 2.2) − 2.3 = −0.15 V (F, {D}) = 1/4(2.3 + 3.2 + 3.4 + 2.2) − 1.1 = 1.675 C • S F B • V(F, {D}) > 0 S E A F D G
  • 38. DIANA (9): 2 (3) • V={A,B,C,D,E,F}, S={D,F} V (i, S) 1 |V |−1 j∈V −{i} d(i, j) if S = φ = 1 |V −S|−1 j∈S∪{i} d(i, j) − 1 |S| j∈S d(i, j) if S = φ V (A, {D, F}) = 1/3(1.2 + 1.3 + 1.9) − 1/2(3.0 + 2.3) = −1.183 V (B, {D, F}) = 1/3(1.2 + 1.0 + 2.0) − 1/2(4.0 + 3.2) = −2.2 V (C, {D, F}) = 1/3(1.3 + 1.0 + 2.5) − 1/2(4.1 + 3.4) = −2.15 V (E, {D, F}) = 1/3(1.9 + 2.0 + 2.5) − 1/2(2.3 + 2.2) = −0.117 • C B A E F G D G D F E A B C
  • 39. DIANA