SlideShare a Scribd company logo
Adaptive XML Tree Mining on Evolving Data Streams


                                 Albert Bifet

       Laboratory for Relational Algorithmics, Complexity and Learning LARCA
                Departament de Llenguatges i Sistemes Informàtics
                         Universitat Politècnica de Catalunya




                           Porto, 21 May 2009
Mining Evolving Massive Structured Data


                                    The basic problem
                                    Finding interesting structure
                                    on data
                                        Mining massive data
                                        Mining time varying data
                                        Mining on real time
                                        Mining XML data


The Disintegration of Persistence
      of Memory 1952-54

         Salvador Dalí


                                                                   2 / 30
XML Tree Classification on evolving data
               streams

        D             D               D           D

        B         B       B       B       B   B       B

    C       C     C           C       C       C

    A                                         A

    C LASS 1     C LASS 2         C LASS 1    C LASS 2


                              D

                Figure: A dataset example


                                                          3 / 30
Tree Pattern Mining

                         Given a dataset of trees, find the
                         complete set of frequent subtrees
                             Frequent Tree Pattern (FT):
                                 Include all the trees whose
                                 support is no less than min_sup

                             Closed Frequent Tree Pattern
                             (CT):
                                 Include no tree which has a
Trees are sanctuaries.           super-tree with the same
Whoever knows how                support
   to listen to them,
can learn the truth.         CT ⊆ FT


  Herman Hesse

                                                               4 / 30
Mining Closed Frequent Trees

Our trees are:                   Our subtrees are:
    Labeled and Unlabeled             Induced
    Ordered and Unordered             Top-down

                  Two different ordered trees
                 but the same unordered tree




                                                     5 / 30
A tale of two trees


Consider D = {A, B}, where       Frequent subtrees
                                             A   B
     A:




              B:




and let min_sup = 2.


                                                     6 / 30
A tale of two trees


Consider D = {A, B}, where        Closed subtrees
                                             A      B
     A:




              B:




and let min_sup = 2.


                                                        6 / 30
XML Tree Classification on evolving data
               streams

        D             D               D           D

        B         B       B       B       B   B       B

    C       C     C           C       C       C

    A                                         A

    C LASS 1     C LASS 2         C LASS 1    C LASS 2


                              D

                Figure: A dataset example


                                                          7 / 30
XML Tree Classification on evolving data
               streams

                                            Tree Trans.
      Closed      Freq. not Closed Trees   1 2 3 4
          D
          B                   B

 c1   C       C           C       C        1   0   1   0
          D
          B           B       C       A
          C           C       A

 c2       A           A                    1   0   0   1


                                                           8 / 30
XML Tree Classification on evolving data
               streams
                        Frequent Trees
       c1          c2         c3            c4
 Id       1
      c1 f1        1 2 3
               c2 f2 f2 f2       1
                             c3 f3      1 2 3    4 5
                                    c4 f4 f4 f4 f4 f4
  1   1 1     1 1 1 1 0 0 1 1 1 1 1 1
  2   0 0     0 0 0 0 1 1 1 1 1 1 1 1
  3   1 1     0 0 0 0 1 1 1 1 1 1 1 1
  4   0 0     1 1 1 1 1 1 1 1 1 1 1 1

                   Closed         Maximal
                    Trees          Trees
  Id Tree     c1   c2 c3    c4   c1 c2 c3    Class
     1        1    1 0      1    1 1 0      C LASS 1
     2        0    0 1      1    0 0 1      C LASS 2
     3        1    0 1      1    1 0 1      C LASS 1
     4        0    1 1      1    0 1 1      C LASS 2

                                                        9 / 30
XML Tree Framework on evolving data
                streams


XML Tree Classification Framework Components
   An XML closed frequent tree miner
   A Data stream classifier algorithm, which we will feed with tuples
   to be classified online.




                                                                       10 / 30
Mining Evolving Tree Data Streams


Problem
Given a data stream D of rooted and unordered trees, find
frequent closed trees.


                                 We provide three algorithms,
                                 of increasing power
                                     Incremental
                                     Sliding Window
                                     Adaptive

           D



                                                                11 / 30
Mining Closed Unordered Subtrees



C LOSED _S UBTREES(t, D, min_sup, T )
 1
 2
 3 for every t that can be extended from t in one step
 4      do if Support(t ) ≥ min_sup
 5            then T ← C LOSED _S UBTREES(t , D, min_sup, T )
 6
 7
 8
 9
10 return T



                                                            12 / 30
Mining Closed Unordered Subtrees



C LOSED _S UBTREES(t, D, min_sup, T )
 1 if not C ANONICAL _R EPRESENTATIVE(t)
 2    then return T
 3 for every t that can be extended from t in one step
 4      do if Support(t ) ≥ min_sup
 5            then T ← C LOSED _S UBTREES(t , D, min_sup, T )
 6
 7
 8
 9
10 return T



                                                            12 / 30
Mining Closed Unordered Subtrees



C LOSED _S UBTREES(t, D, min_sup, T )
 1   if not C ANONICAL _R EPRESENTATIVE(t)
 2       then return T
 3   for every t that can be extended from t in one step
 4          do if Support(t ) ≥ min_sup
 5                then T ← C LOSED _S UBTREES(t , D, min_sup, T )
 6          do if Support(t ) = Support(t)
 7                then t is not closed
 8   if t is closed
 9       then insert t into T
10   return T



                                                                12 / 30
Example
D = {A, B}            A = (0, 1, 2, 3, 2, 1)        B = (0, 1, 2, 3, 1, 2, 2)

min_sup = 2.


                                          (0, 1, 2, 1)

                                                            (0, 1, 2, 2, 1)
                          (0, 1, 1)
                                          (0, 1, 2, 2)
  (0)        (0, 1)
                          (0, 1, 2)                         (0, 1, 2, 3, 1)

                                          (0, 1, 2, 3)




                                                                              13 / 30
Example
D = {A, B}            A = (0, 1, 2, 3, 2, 1)        B = (0, 1, 2, 3, 1, 2, 2)

min_sup = 2.


                                          (0, 1, 2, 1)

                                                            (0, 1, 2, 2, 1)
                          (0, 1, 1)
                                          (0, 1, 2, 2)
  (0)        (0, 1)
                          (0, 1, 2)                         (0, 1, 2, 3, 1)

                                          (0, 1, 2, 3)




                                                                              13 / 30
Experimental results




TreeNat                   CMTreeMiner
   Unlabeled Trees           Labeled Trees
   Top-Down Subtrees         Induced Subtrees
   No Occurrences            Occurrences

                                                14 / 30
Closure Operator on Trees
    D: the finite input dataset of trees
    T : the (infinite) set of all trees

Definition
We define the following the Galois connection pair:
    For finite A ⊆ D
         σ (A) is the set of subtrees of the A trees in T

                           σ (A) = {t ∈ T   ∀ t ∈ A (t      t )}
    For finite B ⊂ T
         τD (B) is the set of supertrees of the B trees in D

                          τD (B) = {t ∈ D ∀ t ∈ B (t        t )}

Closure Operator
The composition ΓD = σ ◦ τD is a closure operator.
                                                                   15 / 30
Galois Lattice of closed set of trees




      1           2          3




     12                13    23




                 123
                                        16 / 30
Galois Lattice of closed set of trees




                 1          2           3

      D


B={       }     12               13     23




                           123
                                            17 / 30
Galois Lattice of closed set of trees


     B={      }



                        1          2           3




τD (B) = {    ,    }
                       12               13     23




                                  123
                                                   17 / 30
Galois Lattice of closed set of trees


     B={         }



                                 1               2          3




τD (B) = {       ,      }
                                12                     13   23




ΓD (B) = σ ◦τD(B) = {       and its subtrees }

                                                 123
                                                                17 / 30
Algorithms

Algorithms
    Incremental: I NC T REE N AT
    Sliding Window: W IN T REE N AT
    Adaptive: A DAT REE N AT Uses ADWIN to monitor change

ADWIN
An adaptive sliding window whose size is recomputed online
according to the rate of change observed.

ADWIN has rigorous guarantees (theorems)
    On ratio of false positives and false negatives
    On the relation of the size of the current window and change
    rates


                                                                   18 / 30
Experimental Validation: TN1



                                    CMTreeMiner
       300

 Time 200
(sec.)
       100
                                     I NC T REE N AT
                     2        4          6             8
                          Size (Milions)

Figure: Experiments on ordered trees with TN1 dataset




                                                           19 / 30
What is MOA?

{M}assive {O}nline {A}nalysis is a framework for online learning
from data streams.




    It is closely related to WEKA
    It includes a collection of offline and online as well as tools for
    evaluation:
         boosting and bagging
         Hoeffding Trees
    with and without Naïve Bayes classifiers at the leaves.




                                                                         20 / 30
WEKA: the bird




                 21 / 30
MOA: the bird

The Moa (another native NZ bird) is not only flightless, like the
                  Weka, but also extinct.




                                                                   22 / 30
MOA: the bird

The Moa (another native NZ bird) is not only flightless, like the
                  Weka, but also extinct.




                                                                   22 / 30
MOA: the bird

The Moa (another native NZ bird) is not only flightless, like the
                  Weka, but also extinct.




                                                                   22 / 30
Data stream classification cycle

1   Process an example at a
    time, and inspect it only
    once (at most)
2   Use a limited amount of
    memory
3   Work in a limited amount
    of time
4   Be ready to predict at any
    point




                                           23 / 30
Environments and Data Sources


   Environments
      Sensor Network: 100Kb
      Handheld Computer: 32 Mb
      Server: 400 Mb

   Data Sources
      Random Tree Generator
      Random RBF Generator
      LED Generator
      Waveform Generator
      Function Generator


                                 24 / 30
Algorithms




Naive Bayes               Prediction strategies
Decision stumps               Majority class
Hoeffding Tree                Naive Bayes Leaves
Hoeffding Option Tree         Adaptive Hybrid
Bagging and Boosting




                                                   25 / 30
Hoeffding Option Tree
Hoeffding Option Trees
Regular Hoeffding tree containing additional option nodes that
allow several tests to be applied, leading to multiple Hoeffding
trees as separate paths.




                                                                   26 / 30
GUI
java -cp .:moa.jar:weka.jar
-javaagent:sizeofag.jar moa.gui.TaskLauncher




                                               27 / 30
GUI
java -cp .:moa.jar:weka.jar
-javaagent:sizeofag.jar moa.gui.TaskLauncher




                                               27 / 30
Ensemble Methods
   http://www.cs.waikato.ac.nz/∼abifet/MOA/




New ensemble methods:
   ADWIN bagging: When a change is detected, the worst classifier
   is removed and a new classifier is added.
   Adaptive-Size Hoeffding Tree bagging

                                                                   28 / 30
XML Tree Framework on evolving data
               streams



                            Maximal                 Closed
           # Trees   Att.   Acc.    Mem.     Att.   Acc.     Mem.
CSLOG12     15483    84     79.64      1.2   228    78.12    2.54
CSLOG23     15037    88     79.81     1.21   243    78.77    2.75
CSLOG31     15702    86     79.94     1.25   243    77.60    2.73
CSLOG123    23111    84     80.02      1.7   228    78.91    4.18

           Table: BAGGING on unordered trees.




                                                              29 / 30
Conclusions

XML tree stream classifier system.


    Using Galois Latice Theory, we present methods for mining
    closed trees
        Incremental
        Sliding Window
        Adaptive: using ADWIN to monitor change

    We use MOA data stream classifiers.




                                                                30 / 30

More Related Content

PDF
Sentiment Knowledge Discovery in Twitter Streaming Data
PDF
New ensemble methods for evolving data streams
PDF
MOA : Massive Online Analysis
PDF
Leveraging Bagging for Evolving Data Streams
PDF
Moa: Real Time Analytics for Data Streams
PDF
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
PDF
Artificial intelligence and data stream mining
Sentiment Knowledge Discovery in Twitter Streaming Data
New ensemble methods for evolving data streams
MOA : Massive Online Analysis
Leveraging Bagging for Evolving Data Streams
Moa: Real Time Analytics for Data Streams
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Artificial intelligence and data stream mining

More from Albert Bifet (20)

PDF
MOA for the IoT at ACML 2016
PDF
Mining Big Data Streams with APACHE SAMOA
PDF
Efficient Online Evaluation of Big Data Stream Classifiers
PDF
Apache Samoa: Mining Big Data Streams with Apache Flink
PDF
Introduction to Big Data Science
PDF
Introduction to Big Data
PDF
Internet of Things Data Science
PDF
Real Time Big Data Management
PDF
A Short Course in Data Stream Mining
PDF
Real-Time Big Data Stream Analytics
PDF
Multi-label Classification with Meta-labels
PDF
Pitfalls in benchmarking data stream classification and how to avoid them
PPTX
STRIP: stream learning of influence probabilities.
PDF
Efficient Data Stream Classification via Probabilistic Adaptive Windows
PPTX
Mining Big Data in Real Time
PDF
Mining Big Data in Real Time
PDF
Mining Frequent Closed Graphs on Evolving Data Streams
PDF
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PDF
Fast Perceptron Decision Tree Learning from Evolving Data Streams
PDF
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
MOA for the IoT at ACML 2016
Mining Big Data Streams with APACHE SAMOA
Efficient Online Evaluation of Big Data Stream Classifiers
Apache Samoa: Mining Big Data Streams with Apache Flink
Introduction to Big Data Science
Introduction to Big Data
Internet of Things Data Science
Real Time Big Data Management
A Short Course in Data Stream Mining
Real-Time Big Data Stream Analytics
Multi-label Classification with Meta-labels
Pitfalls in benchmarking data stream classification and how to avoid them
STRIP: stream learning of influence probabilities.
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Mining Big Data in Real Time
Mining Big Data in Real Time
Mining Frequent Closed Graphs on Evolving Data Streams
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Ad

Recently uploaded (20)

PDF
Electronic commerce courselecture one. Pdf
PDF
Approach and Philosophy of On baking technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPT
Teaching material agriculture food technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Encapsulation theory and applications.pdf
PPTX
Cloud computing and distributed systems.
PDF
cuic standard and advanced reporting.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Big Data Technologies - Introduction.pptx
Electronic commerce courselecture one. Pdf
Approach and Philosophy of On baking technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Chapter 3 Spatial Domain Image Processing.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Teaching material agriculture food technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Encapsulation theory and applications.pdf
Cloud computing and distributed systems.
cuic standard and advanced reporting.pdf
Empathic Computing: Creating Shared Understanding
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Reach Out and Touch Someone: Haptics and Empathic Computing
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Big Data Technologies - Introduction.pptx
Ad

Adaptive XML Tree Mining on Evolving Data Streams

  • 1. Adaptive XML Tree Mining on Evolving Data Streams Albert Bifet Laboratory for Relational Algorithmics, Complexity and Learning LARCA Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Porto, 21 May 2009
  • 2. Mining Evolving Massive Structured Data The basic problem Finding interesting structure on data Mining massive data Mining time varying data Mining on real time Mining XML data The Disintegration of Persistence of Memory 1952-54 Salvador Dalí 2 / 30
  • 3. XML Tree Classification on evolving data streams D D D D B B B B B B B C C C C C C A A C LASS 1 C LASS 2 C LASS 1 C LASS 2 D Figure: A dataset example 3 / 30
  • 4. Tree Pattern Mining Given a dataset of trees, find the complete set of frequent subtrees Frequent Tree Pattern (FT): Include all the trees whose support is no less than min_sup Closed Frequent Tree Pattern (CT): Include no tree which has a Trees are sanctuaries. super-tree with the same Whoever knows how support to listen to them, can learn the truth. CT ⊆ FT Herman Hesse 4 / 30
  • 5. Mining Closed Frequent Trees Our trees are: Our subtrees are: Labeled and Unlabeled Induced Ordered and Unordered Top-down Two different ordered trees but the same unordered tree 5 / 30
  • 6. A tale of two trees Consider D = {A, B}, where Frequent subtrees A B A: B: and let min_sup = 2. 6 / 30
  • 7. A tale of two trees Consider D = {A, B}, where Closed subtrees A B A: B: and let min_sup = 2. 6 / 30
  • 8. XML Tree Classification on evolving data streams D D D D B B B B B B B C C C C C C A A C LASS 1 C LASS 2 C LASS 1 C LASS 2 D Figure: A dataset example 7 / 30
  • 9. XML Tree Classification on evolving data streams Tree Trans. Closed Freq. not Closed Trees 1 2 3 4 D B B c1 C C C C 1 0 1 0 D B B C A C C A c2 A A 1 0 0 1 8 / 30
  • 10. XML Tree Classification on evolving data streams Frequent Trees c1 c2 c3 c4 Id 1 c1 f1 1 2 3 c2 f2 f2 f2 1 c3 f3 1 2 3 4 5 c4 f4 f4 f4 f4 f4 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 2 0 0 0 0 0 0 1 1 1 1 1 1 1 1 3 1 1 0 0 0 0 1 1 1 1 1 1 1 1 4 0 0 1 1 1 1 1 1 1 1 1 1 1 1 Closed Maximal Trees Trees Id Tree c1 c2 c3 c4 c1 c2 c3 Class 1 1 1 0 1 1 1 0 C LASS 1 2 0 0 1 1 0 0 1 C LASS 2 3 1 0 1 1 1 0 1 C LASS 1 4 0 1 1 1 0 1 1 C LASS 2 9 / 30
  • 11. XML Tree Framework on evolving data streams XML Tree Classification Framework Components An XML closed frequent tree miner A Data stream classifier algorithm, which we will feed with tuples to be classified online. 10 / 30
  • 12. Mining Evolving Tree Data Streams Problem Given a data stream D of rooted and unordered trees, find frequent closed trees. We provide three algorithms, of increasing power Incremental Sliding Window Adaptive D 11 / 30
  • 13. Mining Closed Unordered Subtrees C LOSED _S UBTREES(t, D, min_sup, T ) 1 2 3 for every t that can be extended from t in one step 4 do if Support(t ) ≥ min_sup 5 then T ← C LOSED _S UBTREES(t , D, min_sup, T ) 6 7 8 9 10 return T 12 / 30
  • 14. Mining Closed Unordered Subtrees C LOSED _S UBTREES(t, D, min_sup, T ) 1 if not C ANONICAL _R EPRESENTATIVE(t) 2 then return T 3 for every t that can be extended from t in one step 4 do if Support(t ) ≥ min_sup 5 then T ← C LOSED _S UBTREES(t , D, min_sup, T ) 6 7 8 9 10 return T 12 / 30
  • 15. Mining Closed Unordered Subtrees C LOSED _S UBTREES(t, D, min_sup, T ) 1 if not C ANONICAL _R EPRESENTATIVE(t) 2 then return T 3 for every t that can be extended from t in one step 4 do if Support(t ) ≥ min_sup 5 then T ← C LOSED _S UBTREES(t , D, min_sup, T ) 6 do if Support(t ) = Support(t) 7 then t is not closed 8 if t is closed 9 then insert t into T 10 return T 12 / 30
  • 16. Example D = {A, B} A = (0, 1, 2, 3, 2, 1) B = (0, 1, 2, 3, 1, 2, 2) min_sup = 2. (0, 1, 2, 1) (0, 1, 2, 2, 1) (0, 1, 1) (0, 1, 2, 2) (0) (0, 1) (0, 1, 2) (0, 1, 2, 3, 1) (0, 1, 2, 3) 13 / 30
  • 17. Example D = {A, B} A = (0, 1, 2, 3, 2, 1) B = (0, 1, 2, 3, 1, 2, 2) min_sup = 2. (0, 1, 2, 1) (0, 1, 2, 2, 1) (0, 1, 1) (0, 1, 2, 2) (0) (0, 1) (0, 1, 2) (0, 1, 2, 3, 1) (0, 1, 2, 3) 13 / 30
  • 18. Experimental results TreeNat CMTreeMiner Unlabeled Trees Labeled Trees Top-Down Subtrees Induced Subtrees No Occurrences Occurrences 14 / 30
  • 19. Closure Operator on Trees D: the finite input dataset of trees T : the (infinite) set of all trees Definition We define the following the Galois connection pair: For finite A ⊆ D σ (A) is the set of subtrees of the A trees in T σ (A) = {t ∈ T ∀ t ∈ A (t t )} For finite B ⊂ T τD (B) is the set of supertrees of the B trees in D τD (B) = {t ∈ D ∀ t ∈ B (t t )} Closure Operator The composition ΓD = σ ◦ τD is a closure operator. 15 / 30
  • 20. Galois Lattice of closed set of trees 1 2 3 12 13 23 123 16 / 30
  • 21. Galois Lattice of closed set of trees 1 2 3 D B={ } 12 13 23 123 17 / 30
  • 22. Galois Lattice of closed set of trees B={ } 1 2 3 τD (B) = { , } 12 13 23 123 17 / 30
  • 23. Galois Lattice of closed set of trees B={ } 1 2 3 τD (B) = { , } 12 13 23 ΓD (B) = σ ◦τD(B) = { and its subtrees } 123 17 / 30
  • 24. Algorithms Algorithms Incremental: I NC T REE N AT Sliding Window: W IN T REE N AT Adaptive: A DAT REE N AT Uses ADWIN to monitor change ADWIN An adaptive sliding window whose size is recomputed online according to the rate of change observed. ADWIN has rigorous guarantees (theorems) On ratio of false positives and false negatives On the relation of the size of the current window and change rates 18 / 30
  • 25. Experimental Validation: TN1 CMTreeMiner 300 Time 200 (sec.) 100 I NC T REE N AT 2 4 6 8 Size (Milions) Figure: Experiments on ordered trees with TN1 dataset 19 / 30
  • 26. What is MOA? {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. It is closely related to WEKA It includes a collection of offline and online as well as tools for evaluation: boosting and bagging Hoeffding Trees with and without Naïve Bayes classifiers at the leaves. 20 / 30
  • 27. WEKA: the bird 21 / 30
  • 28. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 22 / 30
  • 29. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 22 / 30
  • 30. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 22 / 30
  • 31. Data stream classification cycle 1 Process an example at a time, and inspect it only once (at most) 2 Use a limited amount of memory 3 Work in a limited amount of time 4 Be ready to predict at any point 23 / 30
  • 32. Environments and Data Sources Environments Sensor Network: 100Kb Handheld Computer: 32 Mb Server: 400 Mb Data Sources Random Tree Generator Random RBF Generator LED Generator Waveform Generator Function Generator 24 / 30
  • 33. Algorithms Naive Bayes Prediction strategies Decision stumps Majority class Hoeffding Tree Naive Bayes Leaves Hoeffding Option Tree Adaptive Hybrid Bagging and Boosting 25 / 30
  • 34. Hoeffding Option Tree Hoeffding Option Trees Regular Hoeffding tree containing additional option nodes that allow several tests to be applied, leading to multiple Hoeffding trees as separate paths. 26 / 30
  • 37. Ensemble Methods http://www.cs.waikato.ac.nz/∼abifet/MOA/ New ensemble methods: ADWIN bagging: When a change is detected, the worst classifier is removed and a new classifier is added. Adaptive-Size Hoeffding Tree bagging 28 / 30
  • 38. XML Tree Framework on evolving data streams Maximal Closed # Trees Att. Acc. Mem. Att. Acc. Mem. CSLOG12 15483 84 79.64 1.2 228 78.12 2.54 CSLOG23 15037 88 79.81 1.21 243 78.77 2.75 CSLOG31 15702 86 79.94 1.25 243 77.60 2.73 CSLOG123 23111 84 80.02 1.7 228 78.91 4.18 Table: BAGGING on unordered trees. 29 / 30
  • 39. Conclusions XML tree stream classifier system. Using Galois Latice Theory, we present methods for mining closed trees Incremental Sliding Window Adaptive: using ADWIN to monitor change We use MOA data stream classifiers. 30 / 30