SlideShare a Scribd company logo
Semantic Interoperability Methodologies Johann Höchtl Danube University Krems Center for E-Government Austria
Bridges Bridging Concepts Corn vs. Rice Red Wine vs. Sake Surströmming vs. thousand-year egg Lederhosen vs. Sari Bachblüten vs. Reiki . . Istanbul Bridge Map by Openstreetmap.org Europe Asia
Super – Sub - Concepts Corn vs. Rice Red Wine vs. Sake Surströmming vs.  thousand-year egg Know Everything vs. Domain Specific Knowledge? Food Protein Alcohol Nat. Preservative Lederhosen vs. Sari Clothing Natural Materials Bachblüten vs. Reiki Medicine Alternative Medicine Superconcept / Higher Ontology Sub-Concept / Lower Ontology Finance    Buy Logistic    Store
Who we are and What we do Danube University Krems, Austria Only State-Owned Post Graduate University Center for E-Government E-Inclusion and E-Participation and their impacts on electronic society http:// digitalgovernment.wordpress.com Journal of E-Democrcy and Open Government http:// www.jedem.org About the Presenter E-Participative Processes and Models of Incorporation Doctoral Thesis University of Vienna and Technical University of Vienna, Business Informatics, Vienna
The Problem – State of Computer Automated Semantic Understanding “ The tragi-comic failure of Netbase can teach a lot to every company in the Semantic space.” Lesson 1 : Don’t even try to boil the ocean of the WWW with these technologies. [The] Internet is full of valuable information but crap (or opinions) is 90% [of it] , the cost of getting rid of this crap and save only the good stuff is very high Lesson 2 : Linguistic approaches are likely going to fail because search engines (and machines) can’t distinguish joke/seriousness, sarcasm/shame and sentiments in general. The semantic meaning is right there not in the words of a text. Lesson 3 : If you choose to apply such approaches to one specific topic like Medicine (good choice) then stick to that topic , that means accept as INPUT only medical terms and provide as OUTPUTS only medical terms. This last point requires human intervention and predefined taxonomies/ontologies but Netbase claims that they don’t need them both, ]i.e., that] their engine is fully automatic the failure too.” Reddit: Source: http://marklogic.blogspot.com/2009/09/netbase-tragicomedy-perils-of-magic-and.html
Notions of Similarity How can a computer system declare two data fragments similar and to what extend? Starting point: Data transformation into a computable dimension Canonical data structure is Matrix, X and Y Dimensions contain identified terms and their respective similarity Visualization as tree or directed graph Required Computational effort is very high Brute-force approach: Compare every identified term of document instance A with every identified term of document instance B Recent approaches: Genetic algorithms: “90:9:1 syndrome”: 90% of results are very good, 9% are acceptable and one percent degenerates
Similarities Structural Similarity Number of Edit Operations to transform tree A (document artifact A) into tree B Maximum common sub Graph, Minimum common super graph Similarity Flooding Two graphs are similar if the neighborhoods for every Node are similar. Element based Similarity Element names Data types Similarity algorithms Strings: Levenshtein distance, lingusitic similarity (soundex) Logical structure: Jaccard index, Dice coefficient, cosine similarity
Ontology Similarity - Approaches What is the most specific common ancestor of a pair of concepts in an ontology – distance of concepts? Assign different similarity weights according to relationship (Synonyms, Hypernyms, Antonyms, Meronyms, …) Measure through ontology collection (Opencyc, Wordnet, Wikipedia,  DMOZ) or Knowledge Base EDGE, LEACOCK, RESNIK,LIN, JIANG BUT: Abbreviations, AssocWords with delimiters (ArrivalAirportIn), Suffix/Prefix (hasName), misspellings, free invented words, … Human interaction (interactive mapping, enriched logic) is necessary Solution: Knowledge Base, but high computational overhead!
Research Approach Domain focus: Mapping of Document standards derived from UN/CEFACT CCTS Core Component Technical Specification Support OAGIS 9.1, GS1 XML, UBL 2.0 and UN/CEFACT CCL 07B as common ancestor Specialist approach: Reuse semantic knowledge instead of re-create already existing knowledge Automatically inferable semantic knowledge is in data types, structural similarity, element names. Feed an inference engine to create the upper ontology The „harmonized“ upper ontology contains the relationship between document artifacts
Results Explicit rules incorporated in inference process Heuristics to Discover Structurally Different Association Document Component and Basic Document Component Pairs Different Basic Document Component Association Document Components Recall rate for a domain specific mapper is higher as one relying on automatic inference: Success rate in identifying UBL ABIE to GS1 XML ABIEs 88.1% False positive hits ~ 10% Repository of XSLT mappings as cache GUI: Semantic Interoperability Service Utility (“ISU”) Tryout at  http://144.122.230.66:9090/ISU/web OASIS SET TC at http:// www. oasis -open.org/committees/ set /
Conclusion and outlook Targeted towards SMEs to overcome different communication standards in different domains RossettaNET vs.  OAGIS vs. HL7 vs. … Current implementation focuses on CCL 07B derivatives But expandable model! Applicability beyond SMEs and Supply Chain / Invoicing Northern European Subset of UBL (NES) cooperation on e-commerce and e-procurement purpose is to facilitate harmonization of different types of e-procurement documents in countries that are already using UBL  Consequence: Data exchange between NES and OAGIS, UBL, HL7 … a use case for SET!
THANK YOU! – Questions? Links: http://www.srdc.metu.edu.tr/iSURF/OASIS-SET-TC/tools/ISU-latest.zip http://144.122.230.66:9090/ISU/web   http:// www. oasis -open.org/committees/ set /   http://www.oasis-open.org/committees/download.php/32369/20090504SemanticRepresentationOfDocumentArtifacts.pdf http://www.oasis-open.org/committees/download.php/33577/SET-TC.odp Gerti Kappel, Horst Kargl, Gerhard Kramler, Andrea Schauerhuber, Martina Seidl, Michael Strommer, and Manuel Wimmer, “Matching Metamodels with Semantic Systems - An Experience Report,”  Mainz , 2007, pp. 38-52. Fabien Duchateau and Zohra Bellahsène, “Designing a Benchmark for the Assessment of XML Schema Matching Tools,” Vienna, Austria: ACM, 2007.  Hong-Hai Do and Erhard Rahm, “Matching large schemas: Approaches and evaluation,”  Science Direct , 2007, pp. 857-885.

More Related Content

PPTX
Ontology-based Data Integration
PDF
Xml based data exchange in the
PDF
Hyponymy extraction of domain ontology
PPTX
ontology based- data_integration.
PPTX
Ontology For Data Integration
PDF
A category theoretic model of rdf ontology
PDF
Improve information retrieval and e learning using
PDF
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
Ontology-based Data Integration
Xml based data exchange in the
Hyponymy extraction of domain ontology
ontology based- data_integration.
Ontology For Data Integration
A category theoretic model of rdf ontology
Improve information retrieval and e learning using
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING

What's hot (18)

PDF
Automatically converting tabular data to
PPTX
Ontology integration - Heterogeneity, Techniques and more
PDF
About Correlation Technology
PPT
Data Integration Ontology Mapping
PDF
Swoogle: Showcasing the Significance of Semantic Search
PPT
Enhancing Semantic Mining
PDF
Ijetcas14 624
PDF
Using linguistic analysis to translate
PDF
Ijetcas14 639
PDF
Ontology Mapping
PDF
Learning ontologies
PPTX
Ontology Engineering for Big Data
PPT
Ontology Mapping
PPT
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
DOCX
Clustering sentence level text using a novel fuzzy relational clustering algo...
PDF
Artificial Intelligence of the Web through Domain Ontologies
PDF
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
PDF
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
Automatically converting tabular data to
Ontology integration - Heterogeneity, Techniques and more
About Correlation Technology
Data Integration Ontology Mapping
Swoogle: Showcasing the Significance of Semantic Search
Enhancing Semantic Mining
Ijetcas14 624
Using linguistic analysis to translate
Ijetcas14 639
Ontology Mapping
Learning ontologies
Ontology Engineering for Big Data
Ontology Mapping
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
Clustering sentence level text using a novel fuzzy relational clustering algo...
Artificial Intelligence of the Web through Domain Ontologies
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
Ad

Viewers also liked (20)

PDF
Ezechiel Joseph for Babonneau
PDF
Eurocopa 2016 audiencias anteriores
PPT
Emkt Cucea
DOC
PDF
Importancia de la certificación profesional
PDF
Regulacion panama.
PPTX
Unidades de la medida de información
PDF
GWC Valve International Brochure
PPSX
CORPORATE HOUSING - GRUPO SIFU
PDF
El sector textil en Valencia la moda femenina
PDF
You're The Target Report
PPTX
Eden-Email-Marketing-Workshop
DOCX
BERKAY ORALgüncelcv
PDF
Feria de los Mayores de Extremadura 2015. Carpeta Comercial.
PPT
¿Podemos tratar la artrosis y aliviar el dolor articular?
PPTX
Vsco cam
PDF
Curriculum Vitae II
PDF
Bosch H2O2 decon paper
PPTX
Trabajo proyecto de estructura
Ezechiel Joseph for Babonneau
Eurocopa 2016 audiencias anteriores
Emkt Cucea
Importancia de la certificación profesional
Regulacion panama.
Unidades de la medida de información
GWC Valve International Brochure
CORPORATE HOUSING - GRUPO SIFU
El sector textil en Valencia la moda femenina
You're The Target Report
Eden-Email-Marketing-Workshop
BERKAY ORALgüncelcv
Feria de los Mayores de Extremadura 2015. Carpeta Comercial.
¿Podemos tratar la artrosis y aliviar el dolor articular?
Vsco cam
Curriculum Vitae II
Bosch H2O2 decon paper
Trabajo proyecto de estructura
Ad

Similar to E Challenges 2009 Workshop 10b Semantic Interoperability Methodologies (20)

PPT
Semantic technologies at work
PDF
The Revolution Of Cloud Computing
PDF
A semantic framework and software design to enable the transparent integratio...
PPTX
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
PDF
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...
PPT
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

PDF
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
DOC
ICDMWorkshopProposal.doc
PPT
20111022 ontologiescomeofageocas germanymcguinnessfinal
PDF
Nature Inspired Models And The Semantic Web
PPT
Artificial Intelligence and the Internet
PPT
AI (1).ppt ug gjhghhhjkjhhjjffdfhhcchhvvh
PDF
P036401020107
PDF
Detecting outliers and anomalies in data streams
PDF
Supervised Approach to Extract Sentiments from Unstructured Text
PDF
Semantic Interoperability - grafi della conoscenza
PDF
G04124041046
PPTX
2013 nas-ehs-data-integration-dc
PPT
Contractor-Borner-SNA-SAC
PPT
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...
Semantic technologies at work
The Revolution Of Cloud Computing
A semantic framework and software design to enable the transparent integratio...
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
ICDMWorkshopProposal.doc
20111022 ontologiescomeofageocas germanymcguinnessfinal
Nature Inspired Models And The Semantic Web
Artificial Intelligence and the Internet
AI (1).ppt ug gjhghhhjkjhhjjffdfhhcchhvvh
P036401020107
Detecting outliers and anomalies in data streams
Supervised Approach to Extract Sentiments from Unstructured Text
Semantic Interoperability - grafi della conoscenza
G04124041046
2013 nas-ehs-data-integration-dc
Contractor-Borner-SNA-SAC
NeOn: Lifecycle Support for Networked Ontologies - Case Studies in the Pharma...

More from Danube University Krems, Centre for E-Governance (20)

PPTX
Smart Cities workshop at CeDEM17
PPTX
#CeDEM17 - Towards an Open Data based ICT Reference Architecture for Smart Ci...
PPTX
#CeDEM17 - Financial Payments and Smart Cities
PPTX
#CeDEM2017 Smart Cities of Self-Determined Data Subjects
PPTX
Open Data as Enabler of Public Service Co-creation: Exploring the Drivers and...
PDF
DatalEt-Ecosystem Provider - The DEEP project
PPTX
Towards Open Justice: ICT acceptance in the Greek justice system
PPTX
Using fuzzy cognitive maps as decision support tool for smart cities goraczek
PPTX
Understanding of smartphone divide dal yong
PPTX
The motivations behind open access publishing judith schossboeck
PPTX
Social media as hobed of racism and hate speech kobayashi, kaigo, kwak
PDF
Social media and citizen engagement in asia skoric
PDF
Realizin modeling and evaluation city's enerfy efficiency leonidas anthopoulos
PDF
Post 2015 paris c limate conference politics on the internet manuela hartwig
PPTX
Open government and national sovereignty ivo babaja
PPTX
Health r isk communication in the digital era myojung chung
PPTX
An analysis of japanese local government facebook profiles muneo kaigo
PDF
Datenschutzbeauftragte werden in Zukunft eine wichtige Rolle im Unternehmen s...
Smart Cities workshop at CeDEM17
#CeDEM17 - Towards an Open Data based ICT Reference Architecture for Smart Ci...
#CeDEM17 - Financial Payments and Smart Cities
#CeDEM2017 Smart Cities of Self-Determined Data Subjects
Open Data as Enabler of Public Service Co-creation: Exploring the Drivers and...
DatalEt-Ecosystem Provider - The DEEP project
Towards Open Justice: ICT acceptance in the Greek justice system
Using fuzzy cognitive maps as decision support tool for smart cities goraczek
Understanding of smartphone divide dal yong
The motivations behind open access publishing judith schossboeck
Social media as hobed of racism and hate speech kobayashi, kaigo, kwak
Social media and citizen engagement in asia skoric
Realizin modeling and evaluation city's enerfy efficiency leonidas anthopoulos
Post 2015 paris c limate conference politics on the internet manuela hartwig
Open government and national sovereignty ivo babaja
Health r isk communication in the digital era myojung chung
An analysis of japanese local government facebook profiles muneo kaigo
Datenschutzbeauftragte werden in Zukunft eine wichtige Rolle im Unternehmen s...

Recently uploaded (20)

PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Lesson notes of climatology university.
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
RMMM.pdf make it easy to upload and study
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Pharma ospi slides which help in ospi learning
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Classroom Observation Tools for Teachers
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Cell Types and Its function , kingdom of life
O7-L3 Supply Chain Operations - ICLT Program
Abdominal Access Techniques with Prof. Dr. R K Mishra
VCE English Exam - Section C Student Revision Booklet
Final Presentation General Medicine 03-08-2024.pptx
Lesson notes of climatology university.
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
RMMM.pdf make it easy to upload and study
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
102 student loan defaulters named and shamed – Is someone you know on the list?
STATICS OF THE RIGID BODIES Hibbelers.pdf
01-Introduction-to-Information-Management.pdf
Microbial diseases, their pathogenesis and prophylaxis
Anesthesia in Laparoscopic Surgery in India
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Pharma ospi slides which help in ospi learning
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Classroom Observation Tools for Teachers
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Cell Types and Its function , kingdom of life

E Challenges 2009 Workshop 10b Semantic Interoperability Methodologies

  • 1. Semantic Interoperability Methodologies Johann Höchtl Danube University Krems Center for E-Government Austria
  • 2. Bridges Bridging Concepts Corn vs. Rice Red Wine vs. Sake Surströmming vs. thousand-year egg Lederhosen vs. Sari Bachblüten vs. Reiki . . Istanbul Bridge Map by Openstreetmap.org Europe Asia
  • 3. Super – Sub - Concepts Corn vs. Rice Red Wine vs. Sake Surströmming vs. thousand-year egg Know Everything vs. Domain Specific Knowledge? Food Protein Alcohol Nat. Preservative Lederhosen vs. Sari Clothing Natural Materials Bachblüten vs. Reiki Medicine Alternative Medicine Superconcept / Higher Ontology Sub-Concept / Lower Ontology Finance  Buy Logistic  Store
  • 4. Who we are and What we do Danube University Krems, Austria Only State-Owned Post Graduate University Center for E-Government E-Inclusion and E-Participation and their impacts on electronic society http:// digitalgovernment.wordpress.com Journal of E-Democrcy and Open Government http:// www.jedem.org About the Presenter E-Participative Processes and Models of Incorporation Doctoral Thesis University of Vienna and Technical University of Vienna, Business Informatics, Vienna
  • 5. The Problem – State of Computer Automated Semantic Understanding “ The tragi-comic failure of Netbase can teach a lot to every company in the Semantic space.” Lesson 1 : Don’t even try to boil the ocean of the WWW with these technologies. [The] Internet is full of valuable information but crap (or opinions) is 90% [of it] , the cost of getting rid of this crap and save only the good stuff is very high Lesson 2 : Linguistic approaches are likely going to fail because search engines (and machines) can’t distinguish joke/seriousness, sarcasm/shame and sentiments in general. The semantic meaning is right there not in the words of a text. Lesson 3 : If you choose to apply such approaches to one specific topic like Medicine (good choice) then stick to that topic , that means accept as INPUT only medical terms and provide as OUTPUTS only medical terms. This last point requires human intervention and predefined taxonomies/ontologies but Netbase claims that they don’t need them both, ]i.e., that] their engine is fully automatic the failure too.” Reddit: Source: http://marklogic.blogspot.com/2009/09/netbase-tragicomedy-perils-of-magic-and.html
  • 6. Notions of Similarity How can a computer system declare two data fragments similar and to what extend? Starting point: Data transformation into a computable dimension Canonical data structure is Matrix, X and Y Dimensions contain identified terms and their respective similarity Visualization as tree or directed graph Required Computational effort is very high Brute-force approach: Compare every identified term of document instance A with every identified term of document instance B Recent approaches: Genetic algorithms: “90:9:1 syndrome”: 90% of results are very good, 9% are acceptable and one percent degenerates
  • 7. Similarities Structural Similarity Number of Edit Operations to transform tree A (document artifact A) into tree B Maximum common sub Graph, Minimum common super graph Similarity Flooding Two graphs are similar if the neighborhoods for every Node are similar. Element based Similarity Element names Data types Similarity algorithms Strings: Levenshtein distance, lingusitic similarity (soundex) Logical structure: Jaccard index, Dice coefficient, cosine similarity
  • 8. Ontology Similarity - Approaches What is the most specific common ancestor of a pair of concepts in an ontology – distance of concepts? Assign different similarity weights according to relationship (Synonyms, Hypernyms, Antonyms, Meronyms, …) Measure through ontology collection (Opencyc, Wordnet, Wikipedia, DMOZ) or Knowledge Base EDGE, LEACOCK, RESNIK,LIN, JIANG BUT: Abbreviations, AssocWords with delimiters (ArrivalAirportIn), Suffix/Prefix (hasName), misspellings, free invented words, … Human interaction (interactive mapping, enriched logic) is necessary Solution: Knowledge Base, but high computational overhead!
  • 9. Research Approach Domain focus: Mapping of Document standards derived from UN/CEFACT CCTS Core Component Technical Specification Support OAGIS 9.1, GS1 XML, UBL 2.0 and UN/CEFACT CCL 07B as common ancestor Specialist approach: Reuse semantic knowledge instead of re-create already existing knowledge Automatically inferable semantic knowledge is in data types, structural similarity, element names. Feed an inference engine to create the upper ontology The „harmonized“ upper ontology contains the relationship between document artifacts
  • 10. Results Explicit rules incorporated in inference process Heuristics to Discover Structurally Different Association Document Component and Basic Document Component Pairs Different Basic Document Component Association Document Components Recall rate for a domain specific mapper is higher as one relying on automatic inference: Success rate in identifying UBL ABIE to GS1 XML ABIEs 88.1% False positive hits ~ 10% Repository of XSLT mappings as cache GUI: Semantic Interoperability Service Utility (“ISU”) Tryout at http://144.122.230.66:9090/ISU/web OASIS SET TC at http:// www. oasis -open.org/committees/ set /
  • 11. Conclusion and outlook Targeted towards SMEs to overcome different communication standards in different domains RossettaNET vs. OAGIS vs. HL7 vs. … Current implementation focuses on CCL 07B derivatives But expandable model! Applicability beyond SMEs and Supply Chain / Invoicing Northern European Subset of UBL (NES) cooperation on e-commerce and e-procurement purpose is to facilitate harmonization of different types of e-procurement documents in countries that are already using UBL Consequence: Data exchange between NES and OAGIS, UBL, HL7 … a use case for SET!
  • 12. THANK YOU! – Questions? Links: http://www.srdc.metu.edu.tr/iSURF/OASIS-SET-TC/tools/ISU-latest.zip http://144.122.230.66:9090/ISU/web http:// www. oasis -open.org/committees/ set / http://www.oasis-open.org/committees/download.php/32369/20090504SemanticRepresentationOfDocumentArtifacts.pdf http://www.oasis-open.org/committees/download.php/33577/SET-TC.odp Gerti Kappel, Horst Kargl, Gerhard Kramler, Andrea Schauerhuber, Martina Seidl, Michael Strommer, and Manuel Wimmer, “Matching Metamodels with Semantic Systems - An Experience Report,” Mainz , 2007, pp. 38-52. Fabien Duchateau and Zohra Bellahsène, “Designing a Benchmark for the Assessment of XML Schema Matching Tools,” Vienna, Austria: ACM, 2007. Hong-Hai Do and Erhard Rahm, “Matching large schemas: Approaches and evaluation,” Science Direct , 2007, pp. 857-885.

Editor's Notes

  • #3: My name is Johann Höchtl I am from Danube University Austria and I will present you some challenges of semantic interoperability and recent research to overcome the problems. Semantic interoperability is much about connecting concepts, thus the term semantic “bridging”. Istanbul would not be metropoly of the importance it has without the two big bridges connecting Europa and Asia. When thinking about Europe and Asia, certain associations arouse. Both have a characteristic food culture, traditional clothing and distinct medical cultures. Terms as Corn and Rice, Red Wine and Sake, Bachblüten and Reiki have something in common, a relationship which can be modeled on a higher level.
  • #4: While the first three concepts fall into the food domain with Corn and Rice being an important protein source, Lederhose and Sari have in common that they are super of concept Clothing and share the property Natural Material and Bachblüten and Reiki are alternative medical treatments. To even more complicate things you can identify horizontal properties. They have in common that they all can be bought which belongs to Finance domain. What we can identify here are relationships and properties, hierarchy attributes. In terms of knowledge engineering these properties are termed superconcepts and sub-concepts or Higher Ontology vs. Lower ontology. As a knowledge worker you may find ask yourself whether you are a generalist or specialist.
  • #5: After this small introductory stuff about what semantic bridging is about, some more information about my workplace. I work for Danube University Krems, the only publicly owned university for continuing education in Austria. The research focus of Center for E-Government is in E-Democracy and the impact of electronic participation on society. You will find out more about what we do when you browse to and participate on our public blog. If you are interested you may submit a paper to to E-Journal of E-Democracy and Open Government.
  • #6: So why are we as a center for e-Government interested in Semantic Ontology driven data exchange? Because the current state of affairs in semantic land does not permit unguided exchange on the semantic level. As long as only technical interoperability is concerned for example when you can strictly follow an XML schema specification, things are fine. But not when it comes down to semantic systems without enriched domain knowledge. In the research we made together with the CIO section of Austrian Chancellery we found out that the recall rate of semantic bridging systems which focus on domain knowledge is higher than in systems which try to extract or reconstruct that domain knowledge by dictionary lookups, word frequency analysis or stemmer approaches. Three months ago netbase made a new service publicly available, a Content Intelligence platform for healthcare. Based on user input he gets treatment advises and possible causes and cures for diseases. While some of the results may be funny, but taken to seriously those advice can be more of harm than good. Here some funny assertions by the system. Since it’s release the system has improved as those funny assertions are not returned any longer.
  • #7: Some fundamentals properties on semantics. First and foremost semantic bridging is much about the detection of similarity in a computerized manner. When semantic information is for example in OWL-DL format it first has to be converted into machine processable representation, which usably is that of a matrix. The two dimensions of the matrix contain the similarity of identified concepts and their similarity expressed between 0 and 1 with 0 meaning no similarity and 1 meaning either identical or full semantic match. As for the human eye a matrix is not the most intuitive form to visualize semantic information, for the human perception, Directed Acyclic Graphs or for special inheritance relationships trees are sensible graphical representations. The naïve approach to compute similarity is to completely enumerate all concepts and to compare pairwise. The theoretical amount of required data processing power for a complete DNA analysis or Internet Data Mining required new comparison algorithms, which reduce the computational complexity to less than NP-complete. A prominent early algorithm was the marching ants algorithm to solve the traveling salesman problem in reasonable time.
  • #8: Many of those semantic similarity problems have their origins in detecting structural similarity, for example comparing the similarity between graphs. Especially in the realm of graph similarity, the influence of semantic similarit research resulted in new approaches and algorithms. While the number of edit operations to transform a tree A into a structural equivalent tree B are rather old, similarity flooding is a quite new methodology. The idea behind similarity flooding is the fundamental assumption, that two concepts are similar, if their neighbors are similar. While this algorithm iteratively traverses the graph at least two-fold and has terrible runtime complexity, additional sensible constraints help to improve the performance for example the maximum depth at which to propagate a similarity of node based on its surrounding nodes or branch prediction to stop comparing branches which are unlikely to match given a certain threshold. Besides the structural similarity of Graphs the element names and their assigned data types also contain semantic information. Dictionary bases algorithms calculate the relatedness of words or similar words may be identified by the soundex or levenshtein-algorithm. Combining multiple similarity measures into one concept, eg. Structural similarity between two nodes and their soundex similarity is another challenge. Once the similarity matrix has been established, the most likely matching pairs have to be determined. Based on similarity indices in the matrix Concepts of A can been as feature vectores and compared to the feature vectors of concept B with the euclidean distance, the well-know cosine distance or the Jaccard coefficent. The Jaccard coefficient measures similarity between sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets.
  • #9: While the previous slide presented algorithms derived from schema matching which are applicable in ontology matching, these algorithms do not account enough for the semantics in an ontology. A frequent problem is to identify the most specific ancestor in an ontology. The EDGE and LEACOCK algorithm for example measure the relatedness of ontologies entirely on distance between edges in the ontology represented as a Directed Graph. In 1995 RESNIK proposed a similarity approach which accounts for the depth of the concepts in the Graph. A node carries less information the higher it can be found along the inheritance line. Dekang Lin refined this concept in 1998 with a very clever, universally applicable, domain and resource-neutral concept. He defines similarity by the amount of information the concepts share in relation to the smallest common sub-concept. To give you an idea on how complex this is, in 2005 a paper was presented to WWW Conference in Chiba Japan. The Department of CS of University of Indiana, US, compared a traditional tree-based approach to a graph-based analysis of similarity between all concepts available on DMOZ.org, excluding world and regional. In 2005 DMoz.org had 150.000 pages. The Calculation of graph-based similarity on hierarchical component and the two non-hierarchical components symbolic and related cross-links required a total of 5000 CPU-hours on a massively parallel CPU cluster consisting of 416 Prestonian cores. But abbreviations or association words add a level of complexity which prevents automatic inference of concepts . In this cases either a custom dictionary knowledge represented in SWRL predicate logic or simply a human based mapping can solve these mapping problems.
  • #10: Therefore the work in SET TC deliberately limits to mapping document standards derived from UN/CEFACTS Core Component Technical Specification. This specification imposes some challenging properties as document artifacts are described as Basic Business Information Entities which are derived from Core Components. Those business entities in turn may compose to Aggregate Business Information Entities. As the inclusion of Business Elements may not be feasible, an Association type exists. This type of information carries the semantics which has to be preserved while deriving the ontology from the provided Excel files defined at UN/CEFACT. The schema information of OAGIS 9.1, UBL 2.0 and GS1 XML can be automatically traced back to their origin in UN/CEFACT Core Component Library and the RacerPro inference engine, enriched with rules refined in predicate logic, computes the higher ontology. This inferred semantic knowledge is used for automated creation of XPATH and XSLT expressions to map document artifacts between these document standards.
  • #11: To further improve match results the automatically semantics derived from these document standards has to be enriched with explicit rules. On the one hand these rules add the information to match semantically equivalent yet structurally different document artifacts. Some rules also deal with the fact that some parts of the document standards have their equivalent in UN/CEFACTS Core Component Technical specification, but in practice are used differently. Structural difference is recovered in Association Document Component and their relation to Basic Document Component Pairs and in Basic Business Document Information Entities. In tests the success rate is higher than those of semantic mapping frameworks leveraging general inference techniques as association lists, dictionaries, stemmer algorithms or word distance algorithms. OASIS Set TC is still an ongoing effort, but you may point your browser to the address in this presentation, documentation is available in the installation package you find in the link section on the last slide as well on the OASSI SET TC homepage.
  • #12: The OASIS SET TC framework is one deliverable of the iSurf project, which will create and Interoperability Service Utility for Collaborative Supply Chain Planning across Multiple Domains Supported by RFID Devices and is targeted towards the needs of SMEs. OASIS SET TC will not replace existing document communication standards but enrich the interoperability. It is an expandable model. Documents which derive from UN CEFACTS CCL specification are easy to incorporate, other formats require a fundamental redefinition of the higher ontology. For my research focus this is just one more cornerstone towards interoperable electronic Services of public administration. Recent efforts concentrate a lot in public procurement with the PEPPOL project or the objectives of the semic.eu platform. Norway, Iceland, Finland, Sweden, Denmark and Norway work together do define a more appropriate version of UBL for public procurement, called the Northern European Subset of UBL. Iceland already exchanged eInvoices in this format. At the point this specification will see widespread application, the need to exchange data in standard of HL7 XML will arise and the SET TC framework will be the right tool to do so.