SlideShare a Scribd company logo
Semantic Interoperability Methodologies Johann Höchtl Danube University Krems Center for E-Government Austria
Bridges Bridging Concepts Corn vs. Rice Red Wine vs. Sake Surströmming vs. thousand-year egg Lederhosen vs. Sari Bachblüten vs. Reiki . . Istanbul Bridge Map by Openstreetmap.org Europe Asia
Super – Sub - Concepts Corn vs. Rice Red Wine vs. Sake Surströmming vs.  thousand-year egg Know Everything vs. Domain Specific Knowledge? Food Protein Alcohol Nat. Preservative Lederhosen vs. Sari Clothing Natural Materials Bachblüten vs. Reiki Medicine Alternative Medicine Superconcept / Higher Ontology Sub-Concept / Lower Ontology Finance    Buy Logistic    Store
Who we are and What we do Danube University Krems, Austria Only State-Owned Post Graduate University Center for E-Government E-Inclusion and E-Participation and their impacts on electronic society http:// digitalgovernment.wordpress.com Journal of E-Democrcy and Open Government http:// www.jedem.org About the Presenter E-Participative Processes and Models of Incorporation Doctoral Thesis University of Vienna and Technical University of Vienna, Business Informatics, Vienna
The Problem – State of Computer Automated Semantic Understanding “ The tragi-comic failure of Netbase can teach a lot to every company in the Semantic space.” Lesson 1 : Don’t even try to boil the ocean of the WWW with these technologies. [The] Internet is full of valuable information but crap (or opinions) is 90% [of it] , the cost of getting rid of this crap and save only the good stuff is very high Lesson 2 : Linguistic approaches are likely going to fail because search engines (and machines) can’t distinguish joke/seriousness, sarcasm/shame and sentiments in general. The semantic meaning is right there not in the words of a text. Lesson 3 : If you choose to apply such approaches to one specific topic like Medicine (good choice) then stick to that topic , that means accept as INPUT only medical terms and provide as OUTPUTS only medical terms. This last point requires human intervention and predefined taxonomies/ontologies but Netbase claims that they don’t need them both, ]i.e., that] their engine is fully automatic the failure too.” Reddit: Source: http://marklogic.blogspot.com/2009/09/netbase-tragicomedy-perils-of-magic-and.html
Notions of Similarity How can a computer system declare two data fragments similar and to what extend? Starting point: Data transformation into a computable dimension Canonical data structure is Matrix, X and Y Dimensions contain identified terms and their respective similarity Visualization as tree or directed graph Required Computational effort is very high Brute-force approach: Compare every identified term of document instance A with every identified term of document instance B Recent approaches: Genetic algorithms: “90:9:1 syndrome”: 90% of results are very good, 9% are acceptable and one percent degenerates
Similarities Structural Similarity Number of Edit Operations to transform tree A (document artifact A) into tree B Maximum common sub Graph, Minimum common super graph Similarity Flooding Two graphs are similar if the neighborhoods for every Node are similar. Element based Similarity Element names Data types Similarity algorithms Strings: Levenshtein distance, lingusitic similarity (soundex) Logical structure: Jaccard index, Dice coefficient, cosine similarity
Ontology Similarity - Approaches What is the most specific common ancestor of a pair of concepts in an ontology – distance of concepts? Assign different similarity weights according to relationship (Synonyms, Hypernyms, Antonyms, Meronyms, …) Measure through ontology collection (Opencyc, Wordnet, Wikipedia,  DMOZ) or Knowledge Base EDGE, LEACOCK, RESNIK,LIN, JIANG BUT: Abbreviations, AssocWords with delimiters (ArrivalAirportIn), Suffix/Prefix (hasName), misspellings, free invented words, … Human interaction (interactive mapping, enriched logic) is necessary Solution: Knowledge Base, but high computational overhead!
Research Approach Domain focus: Mapping of Document standards derived from UN/CEFACT CCTS Core Component Technical Specification Support OAGIS 9.1, GS1 XML, UBL 2.0 and UN/CEFACT CCL 07B as common ancestor Specialist approach: Reuse semantic knowledge instead of re-create already existing knowledge Automatically inferable semantic knowledge is in data types, structural similarity, element names. Feed an inference engine to create the upper ontology The „harmonized“ upper ontology contains the relationship between document artifacts
Results Explicit rules incorporated in inference process Heuristics to Discover Structurally Different Association Document Component and Basic Document Component Pairs Different Basic Document Component Association Document Components Recall rate for a domain specific mapper is higher as one relying on automatic inference: Success rate in identifying UBL ABIE to GS1 XML ABIEs 88.1% False positive hits ~ 10% Repository of XSLT mappings as cache GUI: Semantic Interoperability Service Utility (“ISU”) Tryout at  http://144.122.230.66:9090/ISU/web OASIS SET TC at http:// www. oasis -open.org/committees/ set /
Conclusion and outlook Targeted towards SMEs to overcome different communication standards in different domains RossettaNET vs.  OAGIS vs. HL7 vs. … Current implementation focuses on CCL 07B derivatives But expandable model! Applicability beyond SMEs and Supply Chain / Invoicing Northern European Subset of UBL (NES) cooperation on e-commerce and e-procurement purpose is to facilitate harmonization of different types of e-procurement documents in countries that are already using UBL  Consequence: Data exchange between NES and OAGIS, UBL, HL7 … a use case for SET!
THANK YOU! – Questions? Links: http://www.srdc.metu.edu.tr/iSURF/OASIS-SET-TC/tools/ISU-latest.zip http://144.122.230.66:9090/ISU/web   http:// www. oasis -open.org/committees/ set /   http://www.oasis-open.org/committees/download.php/32369/20090504SemanticRepresentationOfDocumentArtifacts.pdf http://www.oasis-open.org/committees/download.php/33577/SET-TC.odp Gerti Kappel, Horst Kargl, Gerhard Kramler, Andrea Schauerhuber, Martina Seidl, Michael Strommer, and Manuel Wimmer, “Matching Metamodels with Semantic Systems - An Experience Report,”  Mainz , 2007, pp. 38-52. Fabien Duchateau and Zohra Bellahsène, “Designing a Benchmark for the Assessment of XML Schema Matching Tools,” Vienna, Austria: ACM, 2007.  Hong-Hai Do and Erhard Rahm, “Matching large schemas: Approaches and evaluation,”  Science Direct , 2007, pp. 857-885.

More Related Content

PPTX
Ontology-based Data Integration
PDF
Xml based data exchange in the
PDF
Hyponymy extraction of domain ontology
PPTX
ontology based- data_integration.
PPTX
Ontology For Data Integration
PDF
A category theoretic model of rdf ontology
PDF
Improve information retrieval and e learning using
PDF
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
Ontology-based Data Integration
Xml based data exchange in the
Hyponymy extraction of domain ontology
ontology based- data_integration.
Ontology For Data Integration
A category theoretic model of rdf ontology
Improve information retrieval and e learning using
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING

What's hot (18)

PDF
Automatically converting tabular data to
PPTX
Ontology integration - Heterogeneity, Techniques and more
PDF
About Correlation Technology
PPT
Data Integration Ontology Mapping
PDF
Swoogle: Showcasing the Significance of Semantic Search
PPT
Enhancing Semantic Mining
PDF
Ijetcas14 624
PDF
Using linguistic analysis to translate
PDF
Ijetcas14 639
PDF
Ontology Mapping
PDF
Learning ontologies
PPTX
Ontology Engineering for Big Data
PPT
Ontology Mapping
PPT
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
DOCX
Clustering sentence level text using a novel fuzzy relational clustering algo...
PDF
Artificial Intelligence of the Web through Domain Ontologies
PDF
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
PDF
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
Automatically converting tabular data to
Ontology integration - Heterogeneity, Techniques and more
About Correlation Technology
Data Integration Ontology Mapping
Swoogle: Showcasing the Significance of Semantic Search
Enhancing Semantic Mining
Ijetcas14 624
Using linguistic analysis to translate
Ijetcas14 639
Ontology Mapping
Learning ontologies
Ontology Engineering for Big Data
Ontology Mapping
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
Clustering sentence level text using a novel fuzzy relational clustering algo...
Artificial Intelligence of the Web through Domain Ontologies
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
Ad

Viewers also liked (20)

KEY
4 ふぁんたすてぃっく4 スライド
PPSX
Watercolors. Acuarelas (Steve Hanks)
DOCX
Camuflajul
PPTX
Predicting past climates leena
PPT
Objetivos proyecto
ODP
E Participation in Austria - The Project jugend2help
PDF
*Sent warning and solution to 100 Government Officials & Pastors & Ignored ab...
PDF
วิธีสมัคร Facebook
PPTX
130315 HCD-Netにおける資格認定制度 (英文 certification system of hcd net)
PPTX
Week 7 Presentation
PPT
Presentatie frank vd kruis
PPSX
VENICE. George Corominas (1)
PDF
Trinity Kings World Leadership: Family Franchise Systems..."A Proactive Way o...
PDF
Businessmodelgeneration preview
PPTX
Aisle7 : Retail Mobile Apps from Expo West 2011
PPSX
Surrealism & Visionary Art of Goran Mitrovic.
PDF
Gh wedding brochure
PPS
La Vie
PPSX
The Saints by Joe McFadden
ODP
An eGovernment survey among Austrian municipalitites
4 ふぁんたすてぃっく4 スライド
Watercolors. Acuarelas (Steve Hanks)
Camuflajul
Predicting past climates leena
Objetivos proyecto
E Participation in Austria - The Project jugend2help
*Sent warning and solution to 100 Government Officials & Pastors & Ignored ab...
วิธีสมัคร Facebook
130315 HCD-Netにおける資格認定制度 (英文 certification system of hcd net)
Week 7 Presentation
Presentatie frank vd kruis
VENICE. George Corominas (1)
Trinity Kings World Leadership: Family Franchise Systems..."A Proactive Way o...
Businessmodelgeneration preview
Aisle7 : Retail Mobile Apps from Expo West 2011
Surrealism & Visionary Art of Goran Mitrovic.
Gh wedding brochure
La Vie
The Saints by Joe McFadden
An eGovernment survey among Austrian municipalitites
Ad

Similar to Semantic Interoperability Methodologies (20)

PDF
Semantics-aware Content-based Recommender Systems
PDF
IRJET - Deep Collaborrative Filtering with Aspect Information
PPT
Collaborative Ontology Building Project
PPTX
Semantic Similarity and Selection of Resources Published According to Linked ...
ODP
Research on collaborative information sharing systems
PDF
Ontology matching
PPT
Rec systel 2012 competency based recommendation
ODT
Riding The Semantic Wave
PPT
Ontologies for multimedia: the Semantic Culture Web
PDF
Introduction to the Semantic Web
PPT
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
KEY
Semantic Web: A web that is not the Web
PDF
Google Kernel Function
PPT
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
PDF
An Improved Web Explorer using Explicit Semantic Similarity with ontology and...
PPT
Advanced Knowledge Technologies (AKT) -highlights 2006
PDF
Creating Semantic Mashups Bridging Web 2 0 And The Semantic Web Presentation 1
PDF
Creating Semantic Mashups Bridging Web 2 0 And The Semantic Web Presentation 1
PDF
Where the Social Web Meets the Semantic Web. Tom Gruber
PDF
Big Data Palooza Talk: Aspects of Semantic Processing
Semantics-aware Content-based Recommender Systems
IRJET - Deep Collaborrative Filtering with Aspect Information
Collaborative Ontology Building Project
Semantic Similarity and Selection of Resources Published According to Linked ...
Research on collaborative information sharing systems
Ontology matching
Rec systel 2012 competency based recommendation
Riding The Semantic Wave
Ontologies for multimedia: the Semantic Culture Web
Introduction to the Semantic Web
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Semantic Web: A web that is not the Web
Google Kernel Function
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
An Improved Web Explorer using Explicit Semantic Similarity with ontology and...
Advanced Knowledge Technologies (AKT) -highlights 2006
Creating Semantic Mashups Bridging Web 2 0 And The Semantic Web Presentation 1
Creating Semantic Mashups Bridging Web 2 0 And The Semantic Web Presentation 1
Where the Social Web Meets the Semantic Web. Tom Gruber
Big Data Palooza Talk: Aspects of Semantic Processing

More from Johann Höchtl (20)

PDF
Homomorphic encryption on Blockchain Principles
PPTX
Performance-indicator based policy-making in Austria
PPTX
Datenqualität auf Offenen Datenportalen
PDF
ADV FIWARE Workshop starring Docker and Virtualisation
PDF
Projektbeschreibung ADEQUATe
PDF
Institutionalising open data quality - Processes Standards, Tools
PDF
Yound Coders Festival
PDF
Sind wir schon da?!
PDF
Offener Haushalt – Transparenz in öffentlichen Haushalten
PPTX
Datenqualität von Datenportalen
PPTX
Open Government Data & offene Wirtschaftsdaten - Two of a Kind?
PPTX
Elektronische Literaturverwaltung mit Zotero
PPTX
The Case of opendataportal.at
PPTX
From E-Government to Open Government
PDF
Smart Cities and Smart ICT
PPTX
Evaluation of Open Government Data Implementation of City of Vienna
PPTX
Costs of Closed Science
PPTX
Smart Cities, Smart Regions and the Role of Open Data
PPTX
OGD for Culture and Art
PDF
Evaluierung der Open Government Data Umsetzung der Stadt Wien - Auszug
Homomorphic encryption on Blockchain Principles
Performance-indicator based policy-making in Austria
Datenqualität auf Offenen Datenportalen
ADV FIWARE Workshop starring Docker and Virtualisation
Projektbeschreibung ADEQUATe
Institutionalising open data quality - Processes Standards, Tools
Yound Coders Festival
Sind wir schon da?!
Offener Haushalt – Transparenz in öffentlichen Haushalten
Datenqualität von Datenportalen
Open Government Data & offene Wirtschaftsdaten - Two of a Kind?
Elektronische Literaturverwaltung mit Zotero
The Case of opendataportal.at
From E-Government to Open Government
Smart Cities and Smart ICT
Evaluation of Open Government Data Implementation of City of Vienna
Costs of Closed Science
Smart Cities, Smart Regions and the Role of Open Data
OGD for Culture and Art
Evaluierung der Open Government Data Umsetzung der Stadt Wien - Auszug

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPT
Teaching material agriculture food technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
Electronic commerce courselecture one. Pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
cuic standard and advanced reporting.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
NewMind AI Weekly Chronicles - August'25-Week II
MIND Revenue Release Quarter 2 2025 Press Release
Review of recent advances in non-invasive hemoglobin estimation
Mobile App Security Testing_ A Comprehensive Guide.pdf
Unlocking AI with Model Context Protocol (MCP)
Dropbox Q2 2025 Financial Results & Investor Presentation
gpt5_lecture_notes_comprehensive_20250812015547.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation_ Review paper, used for researhc scholars
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Teaching material agriculture food technology
Empathic Computing: Creating Shared Understanding
Electronic commerce courselecture one. Pdf
MYSQL Presentation for SQL database connectivity
cuic standard and advanced reporting.pdf
A comparative analysis of optical character recognition models for extracting...

Semantic Interoperability Methodologies

  • 1. Semantic Interoperability Methodologies Johann Höchtl Danube University Krems Center for E-Government Austria
  • 2. Bridges Bridging Concepts Corn vs. Rice Red Wine vs. Sake Surströmming vs. thousand-year egg Lederhosen vs. Sari Bachblüten vs. Reiki . . Istanbul Bridge Map by Openstreetmap.org Europe Asia
  • 3. Super – Sub - Concepts Corn vs. Rice Red Wine vs. Sake Surströmming vs. thousand-year egg Know Everything vs. Domain Specific Knowledge? Food Protein Alcohol Nat. Preservative Lederhosen vs. Sari Clothing Natural Materials Bachblüten vs. Reiki Medicine Alternative Medicine Superconcept / Higher Ontology Sub-Concept / Lower Ontology Finance  Buy Logistic  Store
  • 4. Who we are and What we do Danube University Krems, Austria Only State-Owned Post Graduate University Center for E-Government E-Inclusion and E-Participation and their impacts on electronic society http:// digitalgovernment.wordpress.com Journal of E-Democrcy and Open Government http:// www.jedem.org About the Presenter E-Participative Processes and Models of Incorporation Doctoral Thesis University of Vienna and Technical University of Vienna, Business Informatics, Vienna
  • 5. The Problem – State of Computer Automated Semantic Understanding “ The tragi-comic failure of Netbase can teach a lot to every company in the Semantic space.” Lesson 1 : Don’t even try to boil the ocean of the WWW with these technologies. [The] Internet is full of valuable information but crap (or opinions) is 90% [of it] , the cost of getting rid of this crap and save only the good stuff is very high Lesson 2 : Linguistic approaches are likely going to fail because search engines (and machines) can’t distinguish joke/seriousness, sarcasm/shame and sentiments in general. The semantic meaning is right there not in the words of a text. Lesson 3 : If you choose to apply such approaches to one specific topic like Medicine (good choice) then stick to that topic , that means accept as INPUT only medical terms and provide as OUTPUTS only medical terms. This last point requires human intervention and predefined taxonomies/ontologies but Netbase claims that they don’t need them both, ]i.e., that] their engine is fully automatic the failure too.” Reddit: Source: http://marklogic.blogspot.com/2009/09/netbase-tragicomedy-perils-of-magic-and.html
  • 6. Notions of Similarity How can a computer system declare two data fragments similar and to what extend? Starting point: Data transformation into a computable dimension Canonical data structure is Matrix, X and Y Dimensions contain identified terms and their respective similarity Visualization as tree or directed graph Required Computational effort is very high Brute-force approach: Compare every identified term of document instance A with every identified term of document instance B Recent approaches: Genetic algorithms: “90:9:1 syndrome”: 90% of results are very good, 9% are acceptable and one percent degenerates
  • 7. Similarities Structural Similarity Number of Edit Operations to transform tree A (document artifact A) into tree B Maximum common sub Graph, Minimum common super graph Similarity Flooding Two graphs are similar if the neighborhoods for every Node are similar. Element based Similarity Element names Data types Similarity algorithms Strings: Levenshtein distance, lingusitic similarity (soundex) Logical structure: Jaccard index, Dice coefficient, cosine similarity
  • 8. Ontology Similarity - Approaches What is the most specific common ancestor of a pair of concepts in an ontology – distance of concepts? Assign different similarity weights according to relationship (Synonyms, Hypernyms, Antonyms, Meronyms, …) Measure through ontology collection (Opencyc, Wordnet, Wikipedia, DMOZ) or Knowledge Base EDGE, LEACOCK, RESNIK,LIN, JIANG BUT: Abbreviations, AssocWords with delimiters (ArrivalAirportIn), Suffix/Prefix (hasName), misspellings, free invented words, … Human interaction (interactive mapping, enriched logic) is necessary Solution: Knowledge Base, but high computational overhead!
  • 9. Research Approach Domain focus: Mapping of Document standards derived from UN/CEFACT CCTS Core Component Technical Specification Support OAGIS 9.1, GS1 XML, UBL 2.0 and UN/CEFACT CCL 07B as common ancestor Specialist approach: Reuse semantic knowledge instead of re-create already existing knowledge Automatically inferable semantic knowledge is in data types, structural similarity, element names. Feed an inference engine to create the upper ontology The „harmonized“ upper ontology contains the relationship between document artifacts
  • 10. Results Explicit rules incorporated in inference process Heuristics to Discover Structurally Different Association Document Component and Basic Document Component Pairs Different Basic Document Component Association Document Components Recall rate for a domain specific mapper is higher as one relying on automatic inference: Success rate in identifying UBL ABIE to GS1 XML ABIEs 88.1% False positive hits ~ 10% Repository of XSLT mappings as cache GUI: Semantic Interoperability Service Utility (“ISU”) Tryout at http://144.122.230.66:9090/ISU/web OASIS SET TC at http:// www. oasis -open.org/committees/ set /
  • 11. Conclusion and outlook Targeted towards SMEs to overcome different communication standards in different domains RossettaNET vs. OAGIS vs. HL7 vs. … Current implementation focuses on CCL 07B derivatives But expandable model! Applicability beyond SMEs and Supply Chain / Invoicing Northern European Subset of UBL (NES) cooperation on e-commerce and e-procurement purpose is to facilitate harmonization of different types of e-procurement documents in countries that are already using UBL Consequence: Data exchange between NES and OAGIS, UBL, HL7 … a use case for SET!
  • 12. THANK YOU! – Questions? Links: http://www.srdc.metu.edu.tr/iSURF/OASIS-SET-TC/tools/ISU-latest.zip http://144.122.230.66:9090/ISU/web http:// www. oasis -open.org/committees/ set / http://www.oasis-open.org/committees/download.php/32369/20090504SemanticRepresentationOfDocumentArtifacts.pdf http://www.oasis-open.org/committees/download.php/33577/SET-TC.odp Gerti Kappel, Horst Kargl, Gerhard Kramler, Andrea Schauerhuber, Martina Seidl, Michael Strommer, and Manuel Wimmer, “Matching Metamodels with Semantic Systems - An Experience Report,” Mainz , 2007, pp. 38-52. Fabien Duchateau and Zohra Bellahsène, “Designing a Benchmark for the Assessment of XML Schema Matching Tools,” Vienna, Austria: ACM, 2007. Hong-Hai Do and Erhard Rahm, “Matching large schemas: Approaches and evaluation,” Science Direct , 2007, pp. 857-885.

Editor's Notes

  • #3: My name is Johann Höchtl I am from Danube University Austria and I will present you some challenges of semantic interoperability and recent research to overcome the problems. Semantic interoperability is much about connecting concepts, thus the term semantic “bridging”. Istanbul would not be metropoly of the importance it has without the two big bridges connecting Europa and Asia. When thinking about Europe and Asia, certain associations arouse. Both have a characteristic food culture, traditional clothing and distinct medical cultures. Terms as Corn and Rice, Red Wine and Sake, Bachblüten and Reiki have something in common, a relationship which can be modeled on a higher level.
  • #4: While the first three concepts fall into the food domain with Corn and Rice being an important protein source, Lederhose and Sari have in common that they are super of concept Clothing and share the property Natural Material and Bachblüten and Reiki are alternative medical treatments. To even more complicate things you can identify horizontal properties. They have in common that they all can be bought which belongs to Finance domain. What we can identify here are relationships and properties, hierarchy attributes. In terms of knowledge engineering these properties are termed superconcepts and sub-concepts or Higher Ontology vs. Lower ontology. As a knowledge worker you may find ask yourself whether you are a generalist or specialist.
  • #5: After this small introductory stuff about what semantic bridging is about, some more information about my workplace. I work for Danube University Krems, the only publicly owned university for continuing education in Austria. The research focus of Center for E-Government is in E-Democracy and the impact of electronic participation on society. You will find out more about what we do when you browse to and participate on our public blog. If you are interested you may submit a paper to to E-Journal of E-Democracy and Open Government.
  • #6: So why are we as a center for e-Government interested in Semantic Ontology driven data exchange? Because the current state of affairs in semantic land does not permit unguided exchange on the semantic level. As long as only technical interoperability is concerned for example when you can strictly follow an XML schema specification, things are fine. But not when it comes down to semantic systems without enriched domain knowledge. In the research we made together with the CIO section of Austrian Chancellery we found out that the recall rate of semantic bridging systems which focus on domain knowledge is higher than in systems which try to extract or reconstruct that domain knowledge by dictionary lookups, word frequency analysis or stemmer approaches. Three months ago netbase made a new service publicly available, a Content Intelligence platform for healthcare. Based on user input he gets treatment advises and possible causes and cures for diseases. While some of the results may be funny, but taken to seriously those advice can be more of harm than good. Here some funny assertions by the system. Since it’s release the system has improved as those funny assertions are not returned any longer.
  • #7: Some fundamentals properties on semantics. First and foremost semantic bridging is much about the detection of similarity in a computerized manner. When semantic information is for example in OWL-DL format it first has to be converted into machine processable representation, which usably is that of a matrix. The two dimensions of the matrix contain the similarity of identified concepts and their similarity expressed between 0 and 1 with 0 meaning no similarity and 1 meaning either identical or full semantic match. As for the human eye a matrix is not the most intuitive form to visualize semantic information, for the human perception, Directed Acyclic Graphs or for special inheritance relationships trees are sensible graphical representations. The naïve approach to compute similarity is to completely enumerate all concepts and to compare pairwise. The theoretical amount of required data processing power for a complete DNA analysis or Internet Data Mining required new comparison algorithms, which reduce the computational complexity to less than NP-complete. A prominent early algorithm was the marching ants algorithm to solve the traveling salesman problem in reasonable time.
  • #8: Many of those semantic similarity problems have their origins in detecting structural similarity, for example comparing the similarity between graphs. Especially in the realm of graph similarity, the influence of semantic similarit research resulted in new approaches and algorithms. While the number of edit operations to transform a tree A into a structural equivalent tree B are rather old, similarity flooding is a quite new methodology. The idea behind similarity flooding is the fundamental assumption, that two concepts are similar, if their neighbors are similar. While this algorithm iteratively traverses the graph at least two-fold and has terrible runtime complexity, additional sensible constraints help to improve the performance for example the maximum depth at which to propagate a similarity of node based on its surrounding nodes or branch prediction to stop comparing branches which are unlikely to match given a certain threshold. Besides the structural similarity of Graphs the element names and their assigned data types also contain semantic information. Dictionary bases algorithms calculate the relatedness of words or similar words may be identified by the soundex or levenshtein-algorithm. Combining multiple similarity measures into one concept, eg. Structural similarity between two nodes and their soundex similarity is another challenge. Once the similarity matrix has been established, the most likely matching pairs have to be determined. Based on similarity indices in the matrix Concepts of A can been as feature vectores and compared to the feature vectors of concept B with the euclidean distance, the well-know cosine distance or the Jaccard coefficent. The Jaccard coefficient measures similarity between sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets.
  • #9: While the previous slide presented algorithms derived from schema matching which are applicable in ontology matching, these algorithms do not account enough for the semantics in an ontology. A frequent problem is to identify the most specific ancestor in an ontology. The EDGE and LEACOCK algorithm for example measure the relatedness of ontologies entirely on distance between edges in the ontology represented as a Directed Graph. In 1995 RESNIK proposed a similarity approach which accounts for the depth of the concepts in the Graph. A node carries less information the higher it can be found along the inheritance line. Dekang Lin refined this concept in 1998 with a very clever, universally applicable, domain and resource-neutral concept. He defines similarity by the amount of information the concepts share in relation to the smallest common sub-concept. To give you an idea on how complex this is, in 2005 a paper was presented to WWW Conference in Chiba Japan. The Department of CS of University of Indiana, US, compared a traditional tree-based approach to a graph-based analysis of similarity between all concepts available on DMOZ.org, excluding world and regional. In 2005 DMoz.org had 150.000 pages. The Calculation of graph-based similarity on hierarchical component and the two non-hierarchical components symbolic and related cross-links required a total of 5000 CPU-hours on a massively parallel CPU cluster consisting of 416 Prestonian cores. But abbreviations or association words add a level of complexity which prevents automatic inference of concepts . In this cases either a custom dictionary knowledge represented in SWRL predicate logic or simply a human based mapping can solve these mapping problems.
  • #10: Therefore the work in SET TC deliberately limits to mapping document standards derived from UN/CEFACTS Core Component Technical Specification. This specification imposes some challenging properties as document artifacts are described as Basic Business Information Entities which are derived from Core Components. Those business entities in turn may compose to Aggregate Business Information Entities. As the inclusion of Business Elements may not be feasible, an Association type exists. This type of information carries the semantics which has to be preserved while deriving the ontology from the provided Excel files defined at UN/CEFACT. The schema information of OAGIS 9.1, UBL 2.0 and GS1 XML can be automatically traced back to their origin in UN/CEFACT Core Component Library and the RacerPro inference engine, enriched with rules refined in predicate logic, computes the higher ontology. This inferred semantic knowledge is used for automated creation of XPATH and XSLT expressions to map document artifacts between these document standards.
  • #11: To further improve match results the automatically semantics derived from these document standards has to be enriched with explicit rules. On the one hand these rules add the information to match semantically equivalent yet structurally different document artifacts. Some rules also deal with the fact that some parts of the document standards have their equivalent in UN/CEFACTS Core Component Technical specification, but in practice are used differently. Structural difference is recovered in Association Document Component and their relation to Basic Document Component Pairs and in Basic Business Document Information Entities. In tests the success rate is higher than those of semantic mapping frameworks leveraging general inference techniques as association lists, dictionaries, stemmer algorithms or word distance algorithms. OASIS Set TC is still an ongoing effort, but you may point your browser to the address in this presentation, documentation is available in the installation package you find in the link section on the last slide as well on the OASSI SET TC homepage.
  • #12: The OASIS SET TC framework is one deliverable of the iSurf project, which will create and Interoperability Service Utility for Collaborative Supply Chain Planning across Multiple Domains Supported by RFID Devices and is targeted towards the needs of SMEs. OASIS SET TC will not replace existing document communication standards but enrich the interoperability. It is an expandable model. Documents which derive from UN CEFACTS CCL specification are easy to incorporate, other formats require a fundamental redefinition of the higher ontology. For my research focus this is just one more cornerstone towards interoperable electronic Services of public administration. Recent efforts concentrate a lot in public procurement with the PEPPOL project or the objectives of the semic.eu platform. Norway, Iceland, Finland, Sweden, Denmark and Norway work together do define a more appropriate version of UBL for public procurement, called the Northern European Subset of UBL. Iceland already exchanged eInvoices in this format. At the point this specification will see widespread application, the need to exchange data in standard of HL7 XML will arise and the SET TC framework will be the right tool to do so.