SlideShare a Scribd company logo
Towards an Empirical Semantic
Web Science: Knowledge Pattern
Extraction and Usage
           Andrea Nuzzolese
                    Ph.D. Student
            Università di Bologna
               STLab, ISTC-CNR
Outline




•   Empirical Semantic Web Science and Knowledge Patterns (KPs)

•   A possible methodology for making KPs emerge from the Web of
    Data

•   The work done so far in KP extraction

•   Evaluating KPs' efficacy through Exploratory Search




                                2
Does a Web science exist?


•   A science usually is applied to clear research objects
    ✦   Physical and biological science analyzes the natural world, and tries to find
        microscopic laws that, extrapolated to the macroscopic realm, would
        generate the behavior observed

•   The Web is an engineered space created through formally
    specified languages and protocols

•   Web pages with their content and links are created by humans
    with a particular task governed by social conventions and laws

•   A Web science exists [Berners-Lee Et Al., 2006] and is oriented
    to:
    ✦   Growth of the engineered space;
    ✦   Human-web interaction patterns
                                         3
What about a Web of Data science?

•   Linked data offers huge data for empirical research




                                 4
What are the research objects of the empirical
                SW science?




 •   The Semantic Web and Linked data give us the chance to
     empirically study what are the patterns in organizing and
     representing knowledge

 •   The research objects of the Semantic Web as an empirical science
     are Knowledge Patterns (KPs)




                                  5
Knoweldge Patterns




•   KPs are small well connected units of meaning, which are
    ✦   task based
    ✦   well grounded
    ✦   cognitively sound

•   KPs find their theoretical grounding in frames
    ✦   “… a frame is a data-structure for representing a stereotyped
        situation.” [Minsky 1975]
    ✦   “...the availability of global patterns of knowledge cuts down on non-determinacy
        enough to offset idiosyncratic bottom-up input that might otherwise be
        confusing.” [Beaugrande 1980]



                                           6
An example of KP




         7
Empirical Semantic Web and KPs




•   KPs emerge from the knowledge soup deriving from the Web

•   A methodology for KP extraction from the Web




                              8
KP extraction



•   The Web is populated by heterogeneous sources

•   We can classify sources in two categories
    ✦   Formal and semi-formal sources modeled by adopting a top-down approach
        ✴   e.g., foundational ontologies, frames, thesauri, etc.
    ✦   Non-formal sources modeled by adopting a bottom-up approach
        ✴   e.g., RDBs, Linked Data, Web pages, XML documents, etc.

•   Our KP extraction methodology is based on two complementary
    approaches
    ✦   A top-down approach
    ✦   A bottom-up approach


                                               9
KP boundary




      10
KP detection and discovery




•   The top-down approach is aimed to extract KPs that already
    exists in a formal or semi-formal structure
    ✦   Possible techniques: reengineering, refactoring based on association rules,
        key concept identification, ontology mapping, etc.

•   The bottom-up approach is aimed to extract to discover or detect
    KPs from data
    ✦   Possible techniques: inductive techniques, machine learning, data mining,
        ontology mining, etc.




                                        11
KP validation



•   The top-down and the bottom-up approaches concur in the
    validation of KPs

•   KP extraction is a matter of understanding how the world or
    specific domains have been described from different perspectives
    ✦   The perspective of domain experts, ontologists, etc., which try to give
        formalizations either of the world or of specific domains
    ✦   The perspective of users, data entries, etc, which effectively populate and
        manage data that report facts about the world

•   For example it would be cognitively relevant if an occurrence of
    KP emerges both with the top-down and the bottom-up
    approach

                                        12
KP extraction methodology




             13
KP reengineering from FrameNet’s frames




•   FrameNet is a cognitive sound lexical knowledge base, which is
    grounded in a large corpus

•   FrameNet consists of a set of frames, which have frame elements
    lexical units, which pair words (lexemes) to frames, and relations
    to corpus elements
    ✦   Each frame can be interpreted as a class of situations




                                        14
An example of frame




          15
Using Semion for reengineering and
                refactoring FrameNet’s frame

!"#$%"$#&'(
!%)*+&(




,-./$-01%(
!%)*+&(




,-./$-01%(
2&"&(




34#5$0(
2&"&(




6*7*#*.1&'(
2&"&(



                                16
FrameNet as LOD




        17
FrameNet as KPs




        18
KP discovery from Wikipedia links




•   Hypothesis
    ✦   the types of linked resources that occur most often for a certain type of
        resource constitute its KP
    ✦   since we expect that any cognitive invariance in explaining/describing things
        is reflected in the wikilink graph, discovered KPs are cognitively sound

•   Contribution
    ✦   an EKP discovery procedure
    ✦   184 EKPs published in OWL2




                                        19
Collecting paths from wikilinks

                                                                              dbpedia:
     dbpo:Person            owl:Thing                        owl:Thing
                                                                             Organisation


                                                Path
        dbpo:                                                                 dbpedia:
                           db:Minnie_Mouse      db:The_Walt_Disney_Company    Company
 FictionalCharacter




dbpo:wikiPageWikiLink                           Path

       rdf:type
                                  dbpo: db:Mickey_Mouse
                           FictionalCharacter
    rdfs:subClassOf

                               dbpo:
                                                             owl:Thing
                        FictionalCharacter

                                             dbpo:Person
                                                       20
Path popularity


                                           Jackson_5
        Dave_Grohl          Michael_Jackson

                                                              Jackie_Jackson
                       Nirvana

                                Madonna
                                                 Prince
                       Charlie_Parker                     Keith_Jarrett

Foo Fighters                                Beatles
   nSubjectRes(Pi,j)/nRes(Si)

                                                              John_Lennon
                                Paul_McCartney



                                     21
Boundaries of KPs




•   An KP(Si) is a set of paths, such that


                  Pi,j ∈ KP(Si) !   pathPopularity(Pi,j, Si) ≥ t



•   t is a threshold, under which a path is not included in an KP

•   How to get a good value for t?



                                     22
Boundary induction


Step                        Description

 1     For each path, calculate the path popularity

       For each subject type, get the 40 top-ranked path popularity
 2
       values*
       Apply multiple correlation (Pearson ρ) between the paths of all
 3     subject types by rank, and check for homogeneity of ranks
       across subject types
       For each of the 40 path popularity ranks, calculate its mean
 4
       across all subject types

 5     Apply k-means clustering on the 40 ranks

       Decide threshold(s) based on k-means as well as other
 6
       indicators (e.g. FrameNet roles distribution)
                             23
Boundary induction




          24
How can be KPs evaluated and used?




•   The evaluation of KPs should be performed in terms of their
    capability to be cognitively sound in capturing and representing
    knowledge

•   A scenario that can be used as for evaluating the efficacy of KPs
    is the exploratory search combined with user studies.




                                 25
Why exploratory search?



•   Exploratory search is characterized “by uncertainty about the space
    being searched and the nature of the problem that motivates the
    search” [White Et Al., 2005]

•   KPs can be used for supporting exploratory search
    ✦   They can be used in order to filter knowledge by drawing a meaningful
        boundary around the retrieved data
    ✦   They allow to suggest exploratory paths based on cognitive criteria of
        relevance

•   We can investigate how KPs help users in exploratory search
    tasks


                                       26
Aemoo: KP-based exploratory search




•   A Web application that supports exploratory search on the Web
    based on KPs extracted from Wikipedia links

•   It aggregates knowledge from Linked Data, Wikipedia, Twitter and
    Google News by applying KPs as knowledge lenses over data

•   It provides an effective summary of knowledge about an entity,
    including explanations




                                27
Exploring knowledge with Aemoo (1)




                  28
Exploring knowledge with Aemoo (2)




                  29
Conclusions


•   We want to contribute to the realization of the Semantic Web as
    an empirical science by providing a methodology for KP
    extraction

•   Our methodology for extracting KPs is based on two approaches
    ✦   a top-down approach
    ✦   a bottom-up approach

•   We have seen our experience in KP extraction so far
    ✦   KPs from FrameNet’s frames
    ✦   KPs from Wikipedia links

•   The evaluation we have in mind should be performed by means of
    exploratory search tasks
    ✦   Aemoo
                                     30
Thanks




  31

More Related Content

PDF
Differentiated unit ppt
PDF
Aemoo: Linked Data Exploration based on Knowledge Patterns
PDF
Fan age: Come concretizzare un Contatto in Cliente redditizio nell'era di Fac...
PDF
Maroon 5 İstanbul Konserinin Sosyal Medya Ayağı
PPTX
Differentiated unit ppt
PPTX
Differentiated unit ppt
PPTX
Ratios and proportions
PPT
Skypad Presentation
Differentiated unit ppt
Aemoo: Linked Data Exploration based on Knowledge Patterns
Fan age: Come concretizzare un Contatto in Cliente redditizio nell'era di Fac...
Maroon 5 İstanbul Konserinin Sosyal Medya Ayağı
Differentiated unit ppt
Differentiated unit ppt
Ratios and proportions
Skypad Presentation

Viewers also liked (9)

PDF
Conference Linked Data: the ScholarlyData project
PDF
Winnie's presentation
PDF
David aradillasppt
PPTX
Differentiated unit- plant CSI
PDF
Winnie's presentation
PPTX
Winnie's presentation
PPT
Leader brands analysis
PDF
eCommerce Age: Come aumentare le vendite del tuo Shop Online con l'Email Mark...
Conference Linked Data: the ScholarlyData project
Winnie's presentation
David aradillasppt
Differentiated unit- plant CSI
Winnie's presentation
Winnie's presentation
Leader brands analysis
eCommerce Age: Come aumentare le vendite del tuo Shop Online con l'Email Mark...
Ad

Similar to Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage (20)

PPTX
ESWC 2011 BLOOMS+
PDF
Knowledge Patterns for the Web: extraction, transformation, and reuse
PDF
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
PDF
Type inference through the analysis of Wikipedia links
PDF
Introduction_to_knowledge_graph.pdf
PPTX
PhD Proposal Defense - Prateek Jain
PPTX
Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
PPTX
Wi2015 - Clustering of Linked Open Data - the LODeX tool
PPTX
Building AI Applications using Knowledge Graphs
PPTX
An Introduction to NOSQL, Graph Databases and Neo4j
PDF
Blurring boundaries to spark motivation: collaborative approaches to teaching...
PDF
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
PPTX
Prateek Jain dissertation defense, Kno.e.sis, Wright State University
PPTX
20130622 okfn hackathon t2
PPTX
Mining Web content for Enhanced Search
PPTX
Domain-specific Knowledge Extraction from the Web of Data
PPTX
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
PPTX
A Knowledge Discovery Framework for Planetary Defense
PDF
Book of the Dead Project
PPT
Deep Web mining
ESWC 2011 BLOOMS+
Knowledge Patterns for the Web: extraction, transformation, and reuse
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
Type inference through the analysis of Wikipedia links
Introduction_to_knowledge_graph.pdf
PhD Proposal Defense - Prateek Jain
Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Building AI Applications using Knowledge Graphs
An Introduction to NOSQL, Graph Databases and Neo4j
Blurring boundaries to spark motivation: collaborative approaches to teaching...
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Prateek Jain dissertation defense, Kno.e.sis, Wright State University
20130622 okfn hackathon t2
Mining Web content for Enhanced Search
Domain-specific Knowledge Extraction from the Web of Data
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
A Knowledge Discovery Framework for Planetary Defense
Book of the Dead Project
Deep Web mining
Ad

More from Andrea Nuzzolese (8)

PDF
Semantic Technologies in ST&DL
PPTX
Sheldon challenge
PDF
Evaluating citation functions in CiTO: cognitive issues
PDF
Loditaly2014 new
PPTX
Towards the automatic identification of the nature of citations
PDF
Knowledge Representation and Reasoning with Apache Stanbol
PPTX
Gathering Lexical Linked Data and Knowledge Patterns from FrameNet
PPTX
Aemoo: exploratory search based on knowledge patterns over the Semantic Web
Semantic Technologies in ST&DL
Sheldon challenge
Evaluating citation functions in CiTO: cognitive issues
Loditaly2014 new
Towards the automatic identification of the nature of citations
Knowledge Representation and Reasoning with Apache Stanbol
Gathering Lexical Linked Data and Knowledge Patterns from FrameNet
Aemoo: exploratory search based on knowledge patterns over the Semantic Web

Recently uploaded (20)

PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
master seminar digital applications in india
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Lesson notes of climatology university.
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Institutional Correction lecture only . . .
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Microbial diseases, their pathogenesis and prophylaxis
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Pharma ospi slides which help in ospi learning
Final Presentation General Medicine 03-08-2024.pptx
master seminar digital applications in india
Pharmacology of Heart Failure /Pharmacotherapy of CHF
human mycosis Human fungal infections are called human mycosis..pptx
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
TR - Agricultural Crops Production NC III.pdf
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Microbial disease of the cardiovascular and lymphatic systems
Lesson notes of climatology university.
VCE English Exam - Section C Student Revision Booklet
O5-L3 Freight Transport Ops (International) V1.pdf
RMMM.pdf make it easy to upload and study
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Complications of Minimal Access Surgery at WLH
Institutional Correction lecture only . . .
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...

Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

  • 1. Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage Andrea Nuzzolese Ph.D. Student Università di Bologna STLab, ISTC-CNR
  • 2. Outline • Empirical Semantic Web Science and Knowledge Patterns (KPs) • A possible methodology for making KPs emerge from the Web of Data • The work done so far in KP extraction • Evaluating KPs' efficacy through Exploratory Search 2
  • 3. Does a Web science exist? • A science usually is applied to clear research objects ✦ Physical and biological science analyzes the natural world, and tries to find microscopic laws that, extrapolated to the macroscopic realm, would generate the behavior observed • The Web is an engineered space created through formally specified languages and protocols • Web pages with their content and links are created by humans with a particular task governed by social conventions and laws • A Web science exists [Berners-Lee Et Al., 2006] and is oriented to: ✦ Growth of the engineered space; ✦ Human-web interaction patterns 3
  • 4. What about a Web of Data science? • Linked data offers huge data for empirical research 4
  • 5. What are the research objects of the empirical SW science? • The Semantic Web and Linked data give us the chance to empirically study what are the patterns in organizing and representing knowledge • The research objects of the Semantic Web as an empirical science are Knowledge Patterns (KPs) 5
  • 6. Knoweldge Patterns • KPs are small well connected units of meaning, which are ✦ task based ✦ well grounded ✦ cognitively sound • KPs find their theoretical grounding in frames ✦ “… a frame is a data-structure for representing a stereotyped situation.” [Minsky 1975] ✦ “...the availability of global patterns of knowledge cuts down on non-determinacy enough to offset idiosyncratic bottom-up input that might otherwise be confusing.” [Beaugrande 1980] 6
  • 8. Empirical Semantic Web and KPs • KPs emerge from the knowledge soup deriving from the Web • A methodology for KP extraction from the Web 8
  • 9. KP extraction • The Web is populated by heterogeneous sources • We can classify sources in two categories ✦ Formal and semi-formal sources modeled by adopting a top-down approach ✴ e.g., foundational ontologies, frames, thesauri, etc. ✦ Non-formal sources modeled by adopting a bottom-up approach ✴ e.g., RDBs, Linked Data, Web pages, XML documents, etc. • Our KP extraction methodology is based on two complementary approaches ✦ A top-down approach ✦ A bottom-up approach 9
  • 11. KP detection and discovery • The top-down approach is aimed to extract KPs that already exists in a formal or semi-formal structure ✦ Possible techniques: reengineering, refactoring based on association rules, key concept identification, ontology mapping, etc. • The bottom-up approach is aimed to extract to discover or detect KPs from data ✦ Possible techniques: inductive techniques, machine learning, data mining, ontology mining, etc. 11
  • 12. KP validation • The top-down and the bottom-up approaches concur in the validation of KPs • KP extraction is a matter of understanding how the world or specific domains have been described from different perspectives ✦ The perspective of domain experts, ontologists, etc., which try to give formalizations either of the world or of specific domains ✦ The perspective of users, data entries, etc, which effectively populate and manage data that report facts about the world • For example it would be cognitively relevant if an occurrence of KP emerges both with the top-down and the bottom-up approach 12
  • 14. KP reengineering from FrameNet’s frames • FrameNet is a cognitive sound lexical knowledge base, which is grounded in a large corpus • FrameNet consists of a set of frames, which have frame elements lexical units, which pair words (lexemes) to frames, and relations to corpus elements ✦ Each frame can be interpreted as a class of situations 14
  • 15. An example of frame 15
  • 16. Using Semion for reengineering and refactoring FrameNet’s frame !"#$%"$#&'( !%)*+&( ,-./$-01%( !%)*+&( ,-./$-01%( 2&"&( 34#5$0( 2&"&( 6*7*#*.1&'( 2&"&( 16
  • 19. KP discovery from Wikipedia links • Hypothesis ✦ the types of linked resources that occur most often for a certain type of resource constitute its KP ✦ since we expect that any cognitive invariance in explaining/describing things is reflected in the wikilink graph, discovered KPs are cognitively sound • Contribution ✦ an EKP discovery procedure ✦ 184 EKPs published in OWL2 19
  • 20. Collecting paths from wikilinks dbpedia: dbpo:Person owl:Thing owl:Thing Organisation Path dbpo: dbpedia: db:Minnie_Mouse db:The_Walt_Disney_Company Company FictionalCharacter dbpo:wikiPageWikiLink Path rdf:type dbpo: db:Mickey_Mouse FictionalCharacter rdfs:subClassOf dbpo: owl:Thing FictionalCharacter dbpo:Person 20
  • 21. Path popularity Jackson_5 Dave_Grohl Michael_Jackson Jackie_Jackson Nirvana Madonna Prince Charlie_Parker Keith_Jarrett Foo Fighters Beatles nSubjectRes(Pi,j)/nRes(Si) John_Lennon Paul_McCartney 21
  • 22. Boundaries of KPs • An KP(Si) is a set of paths, such that Pi,j ∈ KP(Si) ! pathPopularity(Pi,j, Si) ≥ t • t is a threshold, under which a path is not included in an KP • How to get a good value for t? 22
  • 23. Boundary induction Step Description 1 For each path, calculate the path popularity For each subject type, get the 40 top-ranked path popularity 2 values* Apply multiple correlation (Pearson ρ) between the paths of all 3 subject types by rank, and check for homogeneity of ranks across subject types For each of the 40 path popularity ranks, calculate its mean 4 across all subject types 5 Apply k-means clustering on the 40 ranks Decide threshold(s) based on k-means as well as other 6 indicators (e.g. FrameNet roles distribution) 23
  • 25. How can be KPs evaluated and used? • The evaluation of KPs should be performed in terms of their capability to be cognitively sound in capturing and representing knowledge • A scenario that can be used as for evaluating the efficacy of KPs is the exploratory search combined with user studies. 25
  • 26. Why exploratory search? • Exploratory search is characterized “by uncertainty about the space being searched and the nature of the problem that motivates the search” [White Et Al., 2005] • KPs can be used for supporting exploratory search ✦ They can be used in order to filter knowledge by drawing a meaningful boundary around the retrieved data ✦ They allow to suggest exploratory paths based on cognitive criteria of relevance • We can investigate how KPs help users in exploratory search tasks 26
  • 27. Aemoo: KP-based exploratory search • A Web application that supports exploratory search on the Web based on KPs extracted from Wikipedia links • It aggregates knowledge from Linked Data, Wikipedia, Twitter and Google News by applying KPs as knowledge lenses over data • It provides an effective summary of knowledge about an entity, including explanations 27
  • 28. Exploring knowledge with Aemoo (1) 28
  • 29. Exploring knowledge with Aemoo (2) 29
  • 30. Conclusions • We want to contribute to the realization of the Semantic Web as an empirical science by providing a methodology for KP extraction • Our methodology for extracting KPs is based on two approaches ✦ a top-down approach ✦ a bottom-up approach • We have seen our experience in KP extraction so far ✦ KPs from FrameNet’s frames ✦ KPs from Wikipedia links • The evaluation we have in mind should be performed by means of exploratory search tasks ✦ Aemoo 30