SlideShare a Scribd company logo
B ACKGROUND    PARSE T REE K ERNELS      A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion




                      Approximate Tree Kernels
         Konrad Rieck, Tammo Krueger, Ulf Brefeld, Klaus-Robert
                                 ¨
                               Muller


                                   Presented By
                               Niharjyoti Sarangi
                      Indian Institute of Technology Madras




                                      April 21, 2012
B ACKGROUND    PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     O UTLINE OF THE P RESENTATION
     1   B ACKGROUND
            Learning from tree-structured data
            Application Domains
     2   PARSE T REE K ERNELS
           Computing PTK
           Computational constraints
     3   A PPROXIMATE T REE K ERNELS
           Computing ATK
           Validity of ATK
           Types of learning
     4   R ESULTS
            Performance
            Time
            Memory
     5   Conclusion
B ACKGROUND        PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     T REE - STRUCTURED DATA


              Trees: carry hierarchical information
              Flat feature Vectors: Fail to capture the underlying
              dependency structure

     Parse Tree
     An ordered, rooted tree that represents the syntactic structure
     of a string according to some formal grammar.


     A tree X is called a parse tree of G = (S, P, s) if X is derived by
     assembling productions p ∈ P such that every node x ∈ X is
     labeled with a symbol l(x) ∈ S.
B ACKGROUND    PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     E XAMPLES




     Figure: Parse trees for natural language text and the HTTP network
     protocol.
B ACKGROUND        PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     L EARNING FROM TREES


              Kernel functions for Structured data
              Convolution of local kernels
              Parse tree kernel proposed by Collins and Duffy(2002)




     Kernel Functions
     k : X × X → R is a symmetric and positive semi-definite
     function, which implicitly computes an inner product in a
     reproducing kernel Hilbert space
B ACKGROUND     PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     A PPLICATION D OMAINS




         Natural Language Processing
         Web Spam Detection
         Network Intrusion Detection
         Information Retreival from structured
         documents
         ...
B ACKGROUND    PARSE T REE K ERNELS        A PPROXIMATE T REE K ERNELS             R ESULTS   Conclusion



     C OMPUTING PTK


     A generic technique for defining kernel functions over
     structured data is the convolution of local kernels defined over
     sub-structures.
     Parse Tree kernel
     k(X, Z) = x∈X           z∈Z c(x, z)   , Where, X and Z are two parse trees.

     Notations
     xi : i-th child of a node x
     |X|: Number of nodes in X
     χ: Set of all possible trees
B ACKGROUND   PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     I LLUSTRATION




                  Figure: Shared subtrees in two parse trees.
B ACKGROUND   PARSE T REE K ERNELS         A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     C OUNTING F UNCTION

     c(x, z) is known as the counting function which recursively
     determines the number of shared subtrees rooted in the tree
     nodes x and z.
     Defining c(x,z)

                
                 0                             if x,z not derived from same P
      c(x, z) =   λ                             if x,z are leaf nodes
                        |x|
                  λ      i=1 c(xi , zi )        otherwise



     0 ≤ λ ≤ 1 , balances the contribution of subtrees, such that
     small values of decay the contribution of lower nodes in large
     subtrees
B ACKGROUND        PARSE T REE K ERNELS     A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     C OMPUTATIONAL COMPLEXITY


              The complexity is           (n2 ), where n is the number of nodes
              in each parse tree.

     Experimental data
     The computation of a parse tree kernel for two HTML documents
     comprising 10,000 nodes each, requires about 1 gigabyte of memory
     and takes over 100 seconds on a recent computer system.

              We need to compare a large number of parse trees. Going
              by the above statistics, the use of PTKs are rendered to be
              of no practical significance because of the computing
              resources required.
B ACKGROUND        PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     ATTEMPTED IMPROVEMENTS




              A feature selection procedure based on statistical tests.
              Suzuki et.al.
              Limiting computation to node pairs with matching grammar
              symbols. Moschitti
B ACKGROUND    PARSE T REE K ERNELS      A PPROXIMATE T REE K ERNELS             R ESULTS          Conclusion



     C OMPUTING ATK


     Approximation of tree kernels is based on the observation that
     trees often contain redundant parts that are not only irrelevant
     for the learning task but also slow-down the kernel
     computation unnecessarily.

     Approximate Tree kernel
     ˆ                                    z∈Z ˜(x, z)
     k(X,Z)=   s∈S w(s)          x∈X            c          , Where, X and Z are two parse trees.
                                l(x)=s   l(z)=s


     Selection function: w : S → 0, 1
     Controls whether subtrees rooted in nodes with the symbol
     s ∈ S contribute to the convolution. (w(s) = 0 or w(s) = 1)
B ACKGROUND    PARSE T REE K ERNELS         A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     A PPROXIMATE C OUNTING F UNCTION

     ˜(x, z) is the approximate counting function
     c

     Defining ˜(x, z)
             c

                 
                  0
                 
                                                if x,z not derived from same P
                  0                            if x or z not selected
       ˜(x, z) =
       c
                  λ
                                               if x,z are leaf nodes
                         |x|
                   λ          ˜
                          i=1 c(xi , zi )       otherwise



     The selection function w(s) is decided based on the domain and
     data. the exact parse tree kernel is obtained as a special case of
     ATK if w(s) = 1 for all symbols s ∈ S.
B ACKGROUND    PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     ATK IS A VALID KERNEL




     Proof
     Let Φ(X) be the vector of frequencies of all subtrees occurring
                               ˆ
     in X. Then, by definition, Kw can always be written as
            ˆ
            Kw = Pw Φ(X), Pw Φ(Z) ,
     For any w, the projection Pw is independent of the actual X and
                  ˆ
     Z, and hence Kw is a valid kernel.
B ACKGROUND   PARSE T REE K ERNELS          A PPROXIMATE T REE K ERNELS             R ESULTS   Conclusion



     ATK IS FASTER THAN PTK



     Speed up factor qw

                                            s∈S #s (X)#s (Z)
                             qw =
                                          s∈S ws #s (X)#s (Z)
                 Where #s (X) denotes the occurances of nodes x ∈ X that were selected.




     Looking at the above equation, we can argue that even if only
     one symbol is rejected in Approximate Tree Kernel, we get a
     speedup qw ≥ 1.
B ACKGROUND     PARSE T REE K ERNELS       A PPROXIMATE T REE K ERNELS           R ESULTS   Conclusion



     S UPERVISED S ETTING


     Given n labeled parse trees (X1 , y1 ), · · · , (Xn , yn ), where yi are
     the class labels.
     An ideal kernel gram matrix Y is given as follows:

                              Yij = [|yi = yj |] − [|yi = yj |]


     Kernel Target alignment

                                 ˆ
                              Y, Kw        =            ˆ
                                                        Ki j −            ˆ
                                                                          Ki j
                                       F
                                               yi =yj            yi =yj

     Our target now is to maximize the above term w.r.t w.
B ACKGROUND     PARSE T REE K ERNELS        A PPROXIMATE T REE K ERNELS        R ESULTS   Conclusion



     S UPERVISED S ETTING ( CONTD .)



     Optimization Problem
                                             n
              w = argmaxw∈[0,1]|S|                     w(S)                    ˜(x, z)
                                                                               c
                                           i,j=1 s∈S            x∈Xi z∈Zi
                                            i=j                l(x)=s l(z)=s

     subject to,
                                       w(s) ≤ N,            N∈N
                               s∈S
B ACKGROUND     PARSE T REE K ERNELS        A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     U NSUPERVISED S ETTING



     Average Frequency of Node comparison
                                              n
                                       1
                             f (s) =                #s (Xi )#s (Xj )
                                       n2
                                            i,j=1




                                           ExpectedNodeComparisons
          ComparisonRatio(ρ) =
                                       ActualNumberOfComparisonsinPTK
B ACKGROUND     PARSE T REE K ERNELS      A PPROXIMATE T REE K ERNELS        R ESULTS   Conclusion



     U NSUPERVISED S ETTING ( CONTD .)



     Optimization Problem
                                           n
              w = argmaxw∈[0,1]|S|                   w(S)                    ˜(x, z)
                                                                             c
                                         i,j=1 s∈S            x∈Xi z∈Zi
                                          i=j                l(x)=s l(z)=s

     subject to,
                                       s∈S w(s)f (s)
                                                        ≤ρ
                                         s∈S f (s)
B ACKGROUND   PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     S YNTHETIC D ATA




  Figure: Classification performance           Figure: Detection performance for
  for the supervised synthetic data.          the unsupervised synthetic data.
B ACKGROUND   PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     R EAL D ATA




  Figure: Classification performance           Figure: Detection performance for
  for question classification task.            the intrusion detection task (FTP).
B ACKGROUND    PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     T IME




     Figure: Training and testing time of SVMs using the exact and the
     approximate tree kernel.
B ACKGROUND     PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     T IME ( COND .)




       Figure: Run-times for web spam (WS) and intrusion detection (ID).
B ACKGROUND    PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     M EMORY




     Figure: Memory requirements for web spam (WS) and intrusion
     detection (ID).
B ACKGROUND        PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     C ONCLUSION



              Approximate Parse tree Kernels give us a fast and efficient
              way to work with parse trees.
              Improvements in terms of run-time and memory
              requirements. For large trees, the approximation reduces a
              single kernel computation from 1 gigabyte to less than 800
              kilobytes, accompanied by run-time improvements up to
              three orders of magnitude.
              Best results were obtained for Network Intrusion
              Detection.
B ACKGROUND   PARSE T REE K ERNELS       A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     Q UESTIONS




                                     Any Questions ???

More Related Content

PDF
Astaño 4
PDF
cvpr2009 tutorial: kernel methods in computer vision: part II: Statistics and...
PDF
A nonlinear approximation of the Bayesian Update formula
PDF
Gtti 10032021
PDF
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
PDF
Macrocanonical models for texture synthesis
PDF
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
PDF
Kernel Bayes Rule
Astaño 4
cvpr2009 tutorial: kernel methods in computer vision: part II: Statistics and...
A nonlinear approximation of the Bayesian Update formula
Gtti 10032021
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Macrocanonical models for texture synthesis
CVPR2010: Advanced ITinCVPR in a Nutshell: part 6: Mixtures
Kernel Bayes Rule

What's hot (20)

PDF
UMAP - Mathematics and implementational details
PDF
Semi-Supervised Regression using Cluster Ensemble
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
Ak03102250234
PDF
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...
PDF
Logistic Regression(SGD)
PDF
IVR - Chapter 1 - Introduction
PDF
Rank awarealgs small11
PDF
04 greedyalgorithmsii 2x2
PDF
16 Machine Learning Universal Approximation Multilayer Perceptron
PDF
Matrix Computations in Machine Learning
PDF
Talk given at the Twelfth Workshop on Non-Perurbative Quantum Chromodynamics ...
PDF
Lecture4 kenrels functions_rkhs
PDF
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
PDF
A discussion on sampling graphs to approximate network classification functions
PDF
Security of continuous variable quantum key distribution against general attacks
PDF
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
PDF
從 VAE 走向深度學習新理論
PDF
(C f)- weak contraction in cone metric spaces
UMAP - Mathematics and implementational details
Semi-Supervised Regression using Cluster Ensemble
Maximum likelihood estimation of regularisation parameters in inverse problem...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Ak03102250234
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...
Logistic Regression(SGD)
IVR - Chapter 1 - Introduction
Rank awarealgs small11
04 greedyalgorithmsii 2x2
16 Machine Learning Universal Approximation Multilayer Perceptron
Matrix Computations in Machine Learning
Talk given at the Twelfth Workshop on Non-Perurbative Quantum Chromodynamics ...
Lecture4 kenrels functions_rkhs
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
A discussion on sampling graphs to approximate network classification functions
Security of continuous variable quantum key distribution against general attacks
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
從 VAE 走向深度學習新理論
(C f)- weak contraction in cone metric spaces
Ad

Similar to Approximate Tree Kernels (20)

PDF
A Tree Kernel based approach for clone detection
PDF
A tree kernel based approach for clone detection
PPT
Text classification using Text kernels
PDF
Lossy Kernelization
PDF
Minimum Spanning Trees Artificial Intelligence
PDF
Jing Ma - 2017 - Detect Rumors in Microblog Posts Using Propagation Structur...
PPTX
A survey on graph kernels
PPTX
Space-efficient Feature Maps for String Alignment Kernels
PDF
Gwt presen alsip-20111201
PPTX
Lecture5.pptx
PDF
Gwt sdm public
PDF
Nikolay Shilov. CSEDays 3
PDF
Convolutional networks and graph networks through kernels
PDF
Dmss2011 public
PPT
DAA (Unit-2) (ii).ppt design analysis of algorithms
PDF
Relaxed Parsing of Regular Approximations of String-Embedded Languages
PPTX
Incremental collaborative filtering via evolutionary co clustering
PDF
Multivariate decision tree
A Tree Kernel based approach for clone detection
A tree kernel based approach for clone detection
Text classification using Text kernels
Lossy Kernelization
Minimum Spanning Trees Artificial Intelligence
Jing Ma - 2017 - Detect Rumors in Microblog Posts Using Propagation Structur...
A survey on graph kernels
Space-efficient Feature Maps for String Alignment Kernels
Gwt presen alsip-20111201
Lecture5.pptx
Gwt sdm public
Nikolay Shilov. CSEDays 3
Convolutional networks and graph networks through kernels
Dmss2011 public
DAA (Unit-2) (ii).ppt design analysis of algorithms
Relaxed Parsing of Regular Approximations of String-Embedded Languages
Incremental collaborative filtering via evolutionary co clustering
Multivariate decision tree
Ad

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Electronic commerce courselecture one. Pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
DOCX
The AUB Centre for AI in Media Proposal.docx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
Teaching material agriculture food technology
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Programs and apps: productivity, graphics, security and other tools
Understanding_Digital_Forensics_Presentation.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Mobile App Security Testing_ A Comprehensive Guide.pdf
Network Security Unit 5.pdf for BCA BBA.
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Electronic commerce courselecture one. Pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
MIND Revenue Release Quarter 2 2025 Press Release
The AUB Centre for AI in Media Proposal.docx
“AI and Expert System Decision Support & Business Intelligence Systems”
Per capita expenditure prediction using model stacking based on satellite ima...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Teaching material agriculture food technology

Approximate Tree Kernels

  • 1. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion Approximate Tree Kernels Konrad Rieck, Tammo Krueger, Ulf Brefeld, Klaus-Robert ¨ Muller Presented By Niharjyoti Sarangi Indian Institute of Technology Madras April 21, 2012
  • 2. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion O UTLINE OF THE P RESENTATION 1 B ACKGROUND Learning from tree-structured data Application Domains 2 PARSE T REE K ERNELS Computing PTK Computational constraints 3 A PPROXIMATE T REE K ERNELS Computing ATK Validity of ATK Types of learning 4 R ESULTS Performance Time Memory 5 Conclusion
  • 3. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion T REE - STRUCTURED DATA Trees: carry hierarchical information Flat feature Vectors: Fail to capture the underlying dependency structure Parse Tree An ordered, rooted tree that represents the syntactic structure of a string according to some formal grammar. A tree X is called a parse tree of G = (S, P, s) if X is derived by assembling productions p ∈ P such that every node x ∈ X is labeled with a symbol l(x) ∈ S.
  • 4. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion E XAMPLES Figure: Parse trees for natural language text and the HTTP network protocol.
  • 5. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion L EARNING FROM TREES Kernel functions for Structured data Convolution of local kernels Parse tree kernel proposed by Collins and Duffy(2002) Kernel Functions k : X × X → R is a symmetric and positive semi-definite function, which implicitly computes an inner product in a reproducing kernel Hilbert space
  • 6. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion A PPLICATION D OMAINS Natural Language Processing Web Spam Detection Network Intrusion Detection Information Retreival from structured documents ...
  • 7. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion C OMPUTING PTK A generic technique for defining kernel functions over structured data is the convolution of local kernels defined over sub-structures. Parse Tree kernel k(X, Z) = x∈X z∈Z c(x, z) , Where, X and Z are two parse trees. Notations xi : i-th child of a node x |X|: Number of nodes in X χ: Set of all possible trees
  • 8. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion I LLUSTRATION Figure: Shared subtrees in two parse trees.
  • 9. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion C OUNTING F UNCTION c(x, z) is known as the counting function which recursively determines the number of shared subtrees rooted in the tree nodes x and z. Defining c(x,z)   0 if x,z not derived from same P c(x, z) = λ if x,z are leaf nodes  |x| λ i=1 c(xi , zi ) otherwise 0 ≤ λ ≤ 1 , balances the contribution of subtrees, such that small values of decay the contribution of lower nodes in large subtrees
  • 10. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion C OMPUTATIONAL COMPLEXITY The complexity is (n2 ), where n is the number of nodes in each parse tree. Experimental data The computation of a parse tree kernel for two HTML documents comprising 10,000 nodes each, requires about 1 gigabyte of memory and takes over 100 seconds on a recent computer system. We need to compare a large number of parse trees. Going by the above statistics, the use of PTKs are rendered to be of no practical significance because of the computing resources required.
  • 11. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion ATTEMPTED IMPROVEMENTS A feature selection procedure based on statistical tests. Suzuki et.al. Limiting computation to node pairs with matching grammar symbols. Moschitti
  • 12. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion C OMPUTING ATK Approximation of tree kernels is based on the observation that trees often contain redundant parts that are not only irrelevant for the learning task but also slow-down the kernel computation unnecessarily. Approximate Tree kernel ˆ z∈Z ˜(x, z) k(X,Z)= s∈S w(s) x∈X c , Where, X and Z are two parse trees. l(x)=s l(z)=s Selection function: w : S → 0, 1 Controls whether subtrees rooted in nodes with the symbol s ∈ S contribute to the convolution. (w(s) = 0 or w(s) = 1)
  • 13. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion A PPROXIMATE C OUNTING F UNCTION ˜(x, z) is the approximate counting function c Defining ˜(x, z) c   0  if x,z not derived from same P  0 if x or z not selected ˜(x, z) = c  λ  if x,z are leaf nodes  |x| λ ˜ i=1 c(xi , zi ) otherwise The selection function w(s) is decided based on the domain and data. the exact parse tree kernel is obtained as a special case of ATK if w(s) = 1 for all symbols s ∈ S.
  • 14. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion ATK IS A VALID KERNEL Proof Let Φ(X) be the vector of frequencies of all subtrees occurring ˆ in X. Then, by definition, Kw can always be written as ˆ Kw = Pw Φ(X), Pw Φ(Z) , For any w, the projection Pw is independent of the actual X and ˆ Z, and hence Kw is a valid kernel.
  • 15. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion ATK IS FASTER THAN PTK Speed up factor qw s∈S #s (X)#s (Z) qw = s∈S ws #s (X)#s (Z) Where #s (X) denotes the occurances of nodes x ∈ X that were selected. Looking at the above equation, we can argue that even if only one symbol is rejected in Approximate Tree Kernel, we get a speedup qw ≥ 1.
  • 16. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion S UPERVISED S ETTING Given n labeled parse trees (X1 , y1 ), · · · , (Xn , yn ), where yi are the class labels. An ideal kernel gram matrix Y is given as follows: Yij = [|yi = yj |] − [|yi = yj |] Kernel Target alignment ˆ Y, Kw = ˆ Ki j − ˆ Ki j F yi =yj yi =yj Our target now is to maximize the above term w.r.t w.
  • 17. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion S UPERVISED S ETTING ( CONTD .) Optimization Problem n w = argmaxw∈[0,1]|S| w(S) ˜(x, z) c i,j=1 s∈S x∈Xi z∈Zi i=j l(x)=s l(z)=s subject to, w(s) ≤ N, N∈N s∈S
  • 18. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion U NSUPERVISED S ETTING Average Frequency of Node comparison n 1 f (s) = #s (Xi )#s (Xj ) n2 i,j=1 ExpectedNodeComparisons ComparisonRatio(ρ) = ActualNumberOfComparisonsinPTK
  • 19. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion U NSUPERVISED S ETTING ( CONTD .) Optimization Problem n w = argmaxw∈[0,1]|S| w(S) ˜(x, z) c i,j=1 s∈S x∈Xi z∈Zi i=j l(x)=s l(z)=s subject to, s∈S w(s)f (s) ≤ρ s∈S f (s)
  • 20. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion S YNTHETIC D ATA Figure: Classification performance Figure: Detection performance for for the supervised synthetic data. the unsupervised synthetic data.
  • 21. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion R EAL D ATA Figure: Classification performance Figure: Detection performance for for question classification task. the intrusion detection task (FTP).
  • 22. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion T IME Figure: Training and testing time of SVMs using the exact and the approximate tree kernel.
  • 23. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion T IME ( COND .) Figure: Run-times for web spam (WS) and intrusion detection (ID).
  • 24. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion M EMORY Figure: Memory requirements for web spam (WS) and intrusion detection (ID).
  • 25. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion C ONCLUSION Approximate Parse tree Kernels give us a fast and efficient way to work with parse trees. Improvements in terms of run-time and memory requirements. For large trees, the approximation reduces a single kernel computation from 1 gigabyte to less than 800 kilobytes, accompanied by run-time improvements up to three orders of magnitude. Best results were obtained for Network Intrusion Detection.
  • 26. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion Q UESTIONS Any Questions ???