SlideShare a Scribd company logo
Web Page Clustering Using a Fuzzy Logic Based
   Representation and Self-organizing Maps

    Alberto P. Garc´
                   ıa-Plaza, V´
                              ıctor Fresno, Raquel Mart´
                                                       ınez
                     NLP & IR Group, UNED

                       December 12, 2008
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents



             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                  slide 2
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents



             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                  slide 3
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                                  Objectives


              Group HTML documents by content similarity.
              Self-Organizing Maps (SOM) to organize, visualize and
              navigate through the collection.
              Term weighting function taking advantage of HTML tags
                      Combining, by means of fuzzy logic, heuristic criteria based on
                      the inherent semantics of some HTML tags and word positions
                      in the document.

       Hypothesis
       An improvement in document representation will involve an
       increase in map quality.



Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                 slide 4
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
                   1   Fuzzy Logic
                   2   EFCC
                   3   Linguistic Variables
                   4   Knowledge Base
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                  slide 5
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                                 Fuzzy logic



              Capturing human expert knowledge.
              Close to natural language.
              Knowledge base: defined by a set of IF-THEN rules.
              Linguistic variables
                      Defined using natural language words and fuzzy sets.
                      These sets allow the description of the membership degree of
                      an object to a particular class.




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                 slide 6
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
                   1   Fuzzy Logic
                   2   EFCC
                   3   Linguistic Variables
                   4   Knowledge Base
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                  slide 7
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                 slide 8
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                 slide 9
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 10
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 11
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 12
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 13
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 14
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 15
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 16
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 17
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                   Extended Fuzzy Combination of Criteria




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 18
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
                   1   Fuzzy Logic
                   2   EFCC
                   3   Linguistic Variables
                   4   Knowledge Base
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 19
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 20
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 21
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 22
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 23
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 24
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Linguistic Variables




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 25
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
                   1   Fuzzy Logic
                   2   EFCC
                   3   Linguistic Variables
                   4   Knowledge Base
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 26
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                           Knowledge Base




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 27
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                           Knowledge Base




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 28
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                           Knowledge Base




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 29
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                           Knowledge Base




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 30
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
                   1   Dimensionality Reduction
                   2   Document Map
                   3   Evaluation Methods
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 31
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                  Dimensionality Reduction


              Input vectors dimension ranging from 100 to 5000
              Stopwords, puntuaction marks suffixes, and words occurring
              less than 50 times in the whole corpus were removed.
              Two well known methods:
                      Document frequency reduction.
                      Random projection method.
              Three proposed rank-based methods:
                      Most Valued Terms.
                      Fixed reduction method.
                      More Frequent Terms until n level.




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 32
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
                   1   Dimensionality Reduction
                   2   Document Map
                   3   Evaluation Methods
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 33
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                 Experiment Description               Results                  Conclusion


                              Document Map Construction



              Benchmark dataset for clustering: Banksearch1
                      10000 documents
                      10 classes
              SOM size was set equal to the number of classes of input
              documents, i.e. 5x2, in order to compare clustering results.




            1
              M. P. Sinka and D. W. Corne. A large benchmark dataset for web document clustering. Soft Computing
       Systems: Design, Management, and Applications, 2002.
Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                        slide 34
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents


             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
                   1   Dimensionality Reduction
                   2   Document Map
                   3   Evaluation Methods
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 35
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                        Evaluation Methods



              Weighted average of the F-measure for each class.
              After mapping the collection in the trained map, the class
              with greater number of documents mapped on a neuron will
              be selected to label the unit.
              All the document vectors in a neuron which class is different
              from the neuron label will be counted as errors.




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 36
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents



             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 37
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


             Best reduction for each term weighting function




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 38
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                         MFTn reduction provides stability




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 39
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


             EFCC+MFTn obtains its best results with the
                   smallest number of features




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 40
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives               Our Approach                Experiment Description               Results          Conclusion


                                          Table of Contents



             1   Objectives
             2   Our Approach: Extended Fuzzy Combination of Criteria
                 (EFCC)
             3   Experiment Description
             4   Results
             5   Conclusion




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                slide 41
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                                 Conclusion


              Unsupervised document representation method, based on
              fuzzy logic, focused on clustering HTML documents by means
              of self-organizing maps.
              MFTn reduction is the most stable reduction in all cases.
              EFCC representation allows to obtain better results using a
              smaller vocabulary.
              Smaller number of features needed to represent the input
              documents and SOM unit vectors, which implies an
              improvement in computational cost.




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 42
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives              Our Approach                Experiment Description               Results          Conclusion


                                            Thank You!




Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                               slide 43
Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps
Objectives                Our Approach                  Experiment Description                   Results               Conclusion


                                                 Related Work

                                       VSM       Topic     Document                    Weighting             Modifies
                                               Information   Type                      Function               SOM
         Self organization of
         a Massive Document             Yes         Yes             Text         Shannon’s Entrophy              No
         Collection2
         Document Clustering            Yes          No             Text         Binary, TF, TF-IDF              No
         using Phrases3
         Document Clustering            Yes         Yes             Text        ESVM, HSVM, HyM                  No
         using WordNet4
         Conceptional SOM5              Yes          No             Text                    TF                   Yes




            2
              T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, and A. Saarela. Self organization of a
       massive document collection. IEEE Trans. on Neural Networks, 2000.
            3
              J. Bakus, M. Hussin, and M. Kamel. A som-based document clustering using phrases. In ICONIP, 2002.
            4
              C. Hung and S. Wermter. Neural network based document clustering using wordnet ontologies. Int. J.
       Hybrid Intell. Syst., 2004
            5
              Y. Liu, X. Wang, and C. Wu. Consom: A conceptional som model for text clustering. In Neurocomputing,
       2008
Alberto P. Garc´
               ıa-Plaza, V´
                          ıctor Fresno, Raquel Mart´
                                                   ınez, NLP & IR Group, UNED                                                slide 44

More Related Content

KEY
Study proposal: Dohorap
PPT
24 poster
PDF
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
PDF
OwlOntDB: A Scalable Reasoning System for OWL 2 RL Ontologies with Large ABoxes
PDF
PDF
The Semantic Web #8 - Ontology
PDF
Prerequisites of AI Techniques Making Robot To Perform Task With Human (autos...
PDF
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
Study proposal: Dohorap
24 poster
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
OwlOntDB: A Scalable Reasoning System for OWL 2 RL Ontologies with Large ABoxes
The Semantic Web #8 - Ontology
Prerequisites of AI Techniques Making Robot To Perform Task With Human (autos...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...

Viewers also liked (20)

PPT
Fuzzy logic
PDF
Analysing Web GIS apps
PPTX
Developing Efficient Web-based GIS Applications
PDF
Introduction to sar-marjolaine_rouault
PPTX
Synthetic aperture radar
PPT
MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...
PDF
Synthetic aperture radar (sar) 20150930
PDF
OSM and QGIS
PPT
Map to Image Georeferencing using ERDAS software
PPTX
2 cluster analysis
PDF
33412283 solving-fuzzy-logic-problems-with-matlab
PDF
Synthetic aperture radar_advanced
PPT
Feature Extraction and Principal Component Analysis
PDF
Radar 2009 a 14 airborne pulse doppler radar
PPTX
3 principal components analysis
PDF
Radar 2009 a 18 synthetic aperture radar
PPTX
GEOPROCESSING IN QGIS
PPT
Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...
PPTX
Steps for Principal Component Analysis (pca) using ERDAS software
PPTX
Matlab Feature Extraction Using Segmentation And Edge Detection
Fuzzy logic
Analysing Web GIS apps
Developing Efficient Web-based GIS Applications
Introduction to sar-marjolaine_rouault
Synthetic aperture radar
MISSION TO PLANETS (CHANDRAYAAN,MAVEN,CURIOSITY,MANGALYAAN,CASSINI SOLSTICE M...
Synthetic aperture radar (sar) 20150930
OSM and QGIS
Map to Image Georeferencing using ERDAS software
2 cluster analysis
33412283 solving-fuzzy-logic-problems-with-matlab
Synthetic aperture radar_advanced
Feature Extraction and Principal Component Analysis
Radar 2009 a 14 airborne pulse doppler radar
3 principal components analysis
Radar 2009 a 18 synthetic aperture radar
GEOPROCESSING IN QGIS
Remote Sensing And GIS Application In Mineral , Oil , Ground Water MappingMin...
Steps for Principal Component Analysis (pca) using ERDAS software
Matlab Feature Extraction Using Segmentation And Edge Detection
Ad

Recently uploaded (20)

PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
1. Introduction to Computer Programming.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Mushroom cultivation and it's methods.pdf
PDF
Encapsulation theory and applications.pdf
PPT
Teaching material agriculture food technology
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Spectroscopy.pptx food analysis technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Unlocking AI with Model Context Protocol (MCP)
Per capita expenditure prediction using model stacking based on satellite ima...
Reach Out and Touch Someone: Haptics and Empathic Computing
OMC Textile Division Presentation 2021.pptx
Getting Started with Data Integration: FME Form 101
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
1. Introduction to Computer Programming.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Mushroom cultivation and it's methods.pdf
Encapsulation theory and applications.pdf
Teaching material agriculture food technology
Heart disease approach using modified random forest and particle swarm optimi...
MIND Revenue Release Quarter 2 2025 Press Release
Ad

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-Organizing Maps

  • 1. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez NLP & IR Group, UNED December 12, 2008
  • 2. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 2
  • 3. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 3
  • 4. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Objectives Group HTML documents by content similarity. Self-Organizing Maps (SOM) to organize, visualize and navigate through the collection. Term weighting function taking advantage of HTML tags Combining, by means of fuzzy logic, heuristic criteria based on the inherent semantics of some HTML tags and word positions in the document. Hypothesis An improvement in document representation will involve an increase in map quality. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 4
  • 5. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 5
  • 6. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Fuzzy logic Capturing human expert knowledge. Close to natural language. Knowledge base: defined by a set of IF-THEN rules. Linguistic variables Defined using natural language words and fuzzy sets. These sets allow the description of the membership degree of an object to a particular class. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 6
  • 7. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 7
  • 8. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 8
  • 9. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 9
  • 10. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 10
  • 11. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 11
  • 12. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 12
  • 13. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 13
  • 14. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 14
  • 15. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 15
  • 16. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 16
  • 17. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 17
  • 18. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Extended Fuzzy Combination of Criteria Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 18
  • 19. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 19
  • 20. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 20
  • 21. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 21
  • 22. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 22
  • 23. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 23
  • 24. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 24
  • 25. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Linguistic Variables Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 25
  • 26. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 1 Fuzzy Logic 2 EFCC 3 Linguistic Variables 4 Knowledge Base 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 26
  • 27. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Knowledge Base Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 27
  • 28. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Knowledge Base Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 28
  • 29. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Knowledge Base Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 29
  • 30. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Knowledge Base Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 30
  • 31. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 1 Dimensionality Reduction 2 Document Map 3 Evaluation Methods 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 31
  • 32. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Dimensionality Reduction Input vectors dimension ranging from 100 to 5000 Stopwords, puntuaction marks suffixes, and words occurring less than 50 times in the whole corpus were removed. Two well known methods: Document frequency reduction. Random projection method. Three proposed rank-based methods: Most Valued Terms. Fixed reduction method. More Frequent Terms until n level. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 32
  • 33. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 1 Dimensionality Reduction 2 Document Map 3 Evaluation Methods 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 33
  • 34. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Document Map Construction Benchmark dataset for clustering: Banksearch1 10000 documents 10 classes SOM size was set equal to the number of classes of input documents, i.e. 5x2, in order to compare clustering results. 1 M. P. Sinka and D. W. Corne. A large benchmark dataset for web document clustering. Soft Computing Systems: Design, Management, and Applications, 2002. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 34
  • 35. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 1 Dimensionality Reduction 2 Document Map 3 Evaluation Methods 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 35
  • 36. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Evaluation Methods Weighted average of the F-measure for each class. After mapping the collection in the trained map, the class with greater number of documents mapped on a neuron will be selected to label the unit. All the document vectors in a neuron which class is different from the neuron label will be counted as errors. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 36
  • 37. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 37
  • 38. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Best reduction for each term weighting function Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 38
  • 39. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion MFTn reduction provides stability Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 39
  • 40. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion EFCC+MFTn obtains its best results with the smallest number of features Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 40
  • 41. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Table of Contents 1 Objectives 2 Our Approach: Extended Fuzzy Combination of Criteria (EFCC) 3 Experiment Description 4 Results 5 Conclusion Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 41
  • 42. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Conclusion Unsupervised document representation method, based on fuzzy logic, focused on clustering HTML documents by means of self-organizing maps. MFTn reduction is the most stable reduction in all cases. EFCC representation allows to obtain better results using a smaller vocabulary. Smaller number of features needed to represent the input documents and SOM unit vectors, which implies an improvement in computational cost. Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 42
  • 43. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Thank You! Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 43
  • 44. Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps Objectives Our Approach Experiment Description Results Conclusion Related Work VSM Topic Document Weighting Modifies Information Type Function SOM Self organization of a Massive Document Yes Yes Text Shannon’s Entrophy No Collection2 Document Clustering Yes No Text Binary, TF, TF-IDF No using Phrases3 Document Clustering Yes Yes Text ESVM, HSVM, HyM No using WordNet4 Conceptional SOM5 Yes No Text TF Yes 2 T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, and A. Saarela. Self organization of a massive document collection. IEEE Trans. on Neural Networks, 2000. 3 J. Bakus, M. Hussin, and M. Kamel. A som-based document clustering using phrases. In ICONIP, 2002. 4 C. Hung and S. Wermter. Neural network based document clustering using wordnet ontologies. Int. J. Hybrid Intell. Syst., 2004 5 Y. Liu, X. Wang, and C. Wu. Consom: A conceptional som model for text clustering. In Neurocomputing, 2008 Alberto P. Garc´ ıa-Plaza, V´ ıctor Fresno, Raquel Mart´ ınez, NLP & IR Group, UNED slide 44