SlideShare a Scribd company logo
Bootstrapping the Analysis of
Large-scale Web Service Networks


Shahab Mokarizadeh, Royal Institute of Technology , Sweden
            Peep Kungas, Tartu University, Estonia
    Mihhail Matskin, Royal Institute of Technology, Sweden


             IEEE/WIC/ACM International Conference of Web Intelligence   22-27 Aug 2011
Background
Why web service analysis?
    Identifying Missing but Valuable Web service (to be implemented)
     Discovering correlation among public , governmental and private
     sector web services
    Discovery of the most/least exploited concept(s)s, web
     service(s), we service provider(s)
    …..


Initial challenge?
 Vast majority of available services are not semantically annotated
   or even come with any sort of documentation !
2                                                                  22-27 Aug 2011
       IEEE/WIC/ACM International Conference of Web Intelligence
Analysis Roadmap
             • Generate Reference Ontology
                • Initially only WSDL web services


             • Web service Annotation


             • Web Service Matching & Network generation

             • Apply Social Network Analysis Algorithms
                • Information Diffusion among Web service communities
                • Analysis the Impact of Services /Concept on other services
                  or concepts
3
    IEEE/WIC/ACM International Conference of Web Intelligence     22-27 Aug 2011
Remind: WSDL Structure




                Image from : Web Services and Security,1/17/2006 ,Marco Cova


4     IEEE/WIC/ACM International Conference of Web Intelligence          22-27 Aug 2011
Ontology Learning from                                              Information Elicitation

 WSDL Interfaces1                                                       Term Extraction


                                                                      Syntactic Refinement



                                                                      Ontology Discovery
Ontology Learning Input:                                                 Pattern-based
           - Message Part names of input/output                        Semantic Analysis
          parameters                                                 Term Disambiguation
           - XML Schema leaf element names of
          complex types                                                Class and Relation
                                                                         Determination


                                                                      Ontology Organization

[1] ”Ontology Learning for Cost-Effective Large-scale Semantic           Adding Relations
Annotation of XML Schemas and Web Service Interfaces". in Porc.
EKAW 2010, LNAI 6317,pp.401-410, 2010                                                Reference
 5                                                                                   Ontology
         IEEE/WIC/ACM International Conference of Web Intelligence                   22-27 Aug 2011
Annotation Heuristics2
entity_reference ← synset{…}
Concept in Ontology                  Instances in Ontology (terms)

  Example:
Password ← {password, pwd, strPassword, authPassword, pass}
Address ← {addr, address1, postal_address}




     [2] P.Küngas, and M. Dumas.“Cost-Effective Semantic Annotation of XML Schemas and Web
     Service Interfaces”. Proc. IEEE Conference on Services Computing, 2009, pp.372-379,

6                                                                                  22-27 Aug 2011
       IEEE/WIC/ACM International Conference of Web Intelligence
Web service Matching Scheme
Matching of basic elements of Web service input and output
    parameters (ontological instances)


Web service matching Simplified as                                Instance Matching



Rule based matching scheme.
    - A matching rule reveals existence of kind of semantic relation
     between the given two instances.

7      IEEE/WIC/ACM International Conference of Web Intelligence              22-27 Aug 2011
Instance Matching Rules (1)
Rule-1: Same concept . Example: (addr, addr_line) :
           {addr, addr_line} instanceOf Address .

Rule-2: Synonyms Concepts . Example: ( loc, place)
             {loc} instanceOf Location ,
             {place} instanceOf Place
              Place isSynonymOf Location

Rule-3: Subcalss Concepts. Example: (loc, city):
               {loc} instanceOf Location,
               {city} instanceOf City,
               City isSubClassOf Location
8                                                                 22-27 Aug 2011
      IEEE/WIC/ACM International Conference of Web Intelligence
Instance Matching Rules (2)

Rule-4: Rule 2 + Rule 3 .
Example : (bidUId, id)
      {bidUId} instanceOf BidUniqueCode,
      {id} instanceOf ContractIdentifier
       BidUniqueCode isSynonymOf ContractIdentifier

Rule-5: Interrelated by an ontological relations (other than isSynonymOf):
Example :
   Person hasPropertyXXX FirstName.


9                                                                22-27 Aug 2011
     IEEE/WIC/ACM International Conference of Web Intelligence
Evaluate Matching Scheme -1
     1- Classical Approach (Precision, Recall, F-measure)

1. Need a Golden Annotation /Ontology to compare with .
2. Identify :
      True Positives (TP) : the common annotations between golden and
       generated ontology
      False Positives (FP) : annotations made only by generated ontology
      False Negatives (FN): annotations made by golden ontology but not
       discovered by the generated ontology).


3. Compute:


10
        IEEE/WIC/ACM International Conference of Web Intelligence    22-27 Aug 2011
Evaluate Matching Scheme - 2
 2-Tracking Performance of Matching Scheme in Network Model

     •   Generate Semantic Network model out of Annotated Web
         service corpus.

     •   Track the performance of exploited Annotation &
         Matching scheme in the network properties .Web service
         (WSDL) networks (in small size) observed to exhibit:
          •      Small-worldness model
                Scale free model
                Correlation degree on nodes ?



11                                                                   22-27 Aug 2011
         IEEE/WIC/ACM International Conference of Web Intelligence
Web service Network Models
   2-Projecting Matching Scheme Accuracy in Network Model
                                                  Operations      Parameters   Concepts   Semantic Network

WS1 - WS3 : Web services
                                                        WS1             P1         C1
                                                                                                         C1
                                                       OP1
OP1 - OP3 : Web service                                                 P2

 Operations                                                                        C2
                                                      WS2                                      C2                C3
                                                                        P3
                                                       OP2                         C3
P1 - P6 : Basic Elements of Input                                      P4
  / Output Parameters                                                                               C5
                                                                                    C4                            C4
                                                      WS3               P5

C1 – C5 : Ontological Concepts                         OP3                          C5
                                                                        P6
  Representing the Parameter
                                                      Annotated Web service
 12                                                                                                      22-27 Aug 2011
      IEEE/WIC/ACM International Conference of Web Intelligence
Evaluating Network Properties
Small Worldness
Small world networks are networks with the following characteristics:
1. LRandom ≤ LActual                          L: Shortest Path Length
2. CRandom << CActual                         C: Clustering Coefficient
                                              Sindex : Small worldness Index
In other words:
                                                                   > 1, λ > 1, Sindex > 1




Small-worldness scales linearly with
network size.


13                                                                                      22-27 Aug 2011
       IEEE/WIC/ACM International Conference of Web Intelligence
Evaluating Network Properties
Scale free Networks
  Scale free Networks:
                                                                     
 Fitted to power-law function                         y  c.x

                                                  Many nodes with few links
                 # of nodes with M links (log)




                                                                 A few nodes with many links



                                                 # of links (M) (log)
14
     IEEE/WIC/ACM International Conference of Web Intelligence                         22-27 Aug 2011
Evaluating Network Properties
Assortativity of Node Degree (Correlation Degree on Nodes)

 Positive Correlation : if                          vertices with high number of
  connection tend to be connected with other nodes which also
  have many links . Observed in social networks : e.g. network of
  actors.



 Negative Correlation:  if the preference is to attach to those
  having small quantity of connection. Observed in technological
  and biological networks : e.g. Internet, protein interactions.


15                                                                       22-27 Aug 2011
      IEEE/WIC/ACM International Conference of Web Intelligence
Experimental Datasets
     SOATrader dataset: 1,000,000 terms form SOATrader collection
      of 15000 WSDL s collected from different repositories in the Web
      between 2005-2007.
      SOATarder: ( http://www.soatrader.com/web-services) .


     ASSAM dataset3: 146 WSDLs collected by Hess et. al and
      annotated by ASSAM tools .We use all unique terms (appr. 375 )
      with any frequency from this collection.
      ASSAM : http://www.andreas-hess.info/projects/annotator/


      [3] A.Heß, N.Kushmeric, ”Machine Learning for Annotating Semantic Web services
      “,AAAI Spring Symposium Semantic Web Services, 2004
16                                                                          22-27 Aug 2011
       IEEE/WIC/ACM International Conference of Web Intelligence
Golden Ontology


    SOATrader dataset: The golden annotation is handcrafted by
     authors based on top 2000 recurrent terms.

    ASSAM : Exploit the golden annotation developed by ASSAM
     developers and exploited as reference ontology in their
     experiment with ASSAM Web service annotation tool.




17                                                                22-27 Aug 2011
      IEEE/WIC/ACM International Conference of Web Intelligence
Evaluation Result - 1
 Precision, Recall, F-Measure
     0.6
     0.5
     0.4
     0.3
     0.2                                                                                    Rule-1
     0.1                                                                                    Rules 1-4
      0                                                                                     Rules 1-5
                               Recall
                 Precision




                                                                       Recall
                                                        Precision
                                           F-Measure




                                                                                F-Measure
                             Top2000                                ASSAM


18                                                                                               22-27 Aug 2011
           IEEE/WIC/ACM International Conference of Web Intelligence
Dataset for Network Evaluation
Ideal :Use all dataset of WSDL/XSD elements (approx. 1,000,000
     terms) from SOATrader collection (appr. 1 million term) and ASSAM
     collection ( appr. 10000 terms)

Problem with Large dataset:
- The larger is dataset, the bigger will be ontology, the harder will be
  verifying and enhancing the quality of annotation
- Not Cost Effective (human and computation cost) nor Scalable for
  analysis purpose.

Proposal: limit SOATarder experimental dataset to the following four
  arbitrary chosen thresholds ( minimum frequency of occurrence of
  term) 10, 15, 20 and 25( h10, h15, h20, h25 ) , covering 30000
  (unique) most recurrent terms.
19                                                                  22-27 Aug 2011
        IEEE/WIC/ACM International Conference of Web Intelligence
Annotation Progress


                                     h25               h20        h15       h10
 Learned ontology size               4523              5614       7378     11610
 Annotated elements                 588057            596625     621336   663618
 Total elements                     998916            998916     998916   998916
 Percentage of total                 59%               60%        62%       66%




20                                                                          22-27 Aug 2011
     IEEE/WIC/ACM International Conference of Web Intelligence
Analysis of Small Worldness
 Dataset                 Networks           L         C       Sindex
  Entire    Syntactic           Actual    3.283     0.2968    591.08
SOATarder                     Random-ER   3.9229    0.00062
   h 25     Generated           Actual    2.4256     0.259    7.5769
                              Random-ER   2.4756    0.0348
   h20      Generated           Actual    2.3882    0.2811    8.8148
                              Random-ER   2.4851    0.0331
   h15      Generated           Actual    2.3724    0.2805    8.2753
                              Random-ER   2.3396    0.0334
   h10      Generated           Actual    2.5322    0.2449    18.2709
                              Random-ER   2.7662    0.0146
 Top2000     Golden             Actual    2.1895    0.3761    2.8404
                              Random-ER   1.8852    0.1146
            Generated           Actual    2.08475   0.3209    3.3878
                              Random-ER    2.0667   0.0939
 ASSAM       Golden             Actual    4.5653    0.2147    3.1464
                              Random-ER   3.546     0.05304
            Generated           Actual    3.0592    0.4803    21.4835
             Rule. 1          Random-ER   3.8451    0.0281
 21         Generated           Actual    2.5732    0.4057    8.5288
            Rules .1-4        Random-ER   3.1267    0.0578
Analysis of Scale-free Properties & Correlation Degree
Category        Networks            Power-law Degree   #Nodes     Degree
                                       Exponent                 Correlation
Entire          Syntactic                1.3722        67622     -0.0413
 h25            Generated                1.1945         2086     -0.1993
            Random Annotation            0.6332         2086      0.019
  h20          Generated                 1.1977         2394      -0.2093
  h15          Generated                 1.1448         3239      -0.2222
  h10          Generated                 1.2316         4050      -0.1895
Top2000          Golden                  1.1504         856       -0.2238
               Generated                 1.1483         936       -0.2137
                Syntactic                1.1653         828       -0.2229
ASSAM            Golden                  1.5346         170       -0.3079
            Generated- Rule. 1           1.5574         413        0.3642
           Generated - Rules .1-4        1.4566         217       0.041
            Random Annotation            1.0755         170       0.1151
22
                 Syntactic               1.6105         886       0.194
Plot of Degree Distribution
                                                                 Out-degree Distribution
                                                                 of Random Annotation




     Out-degree Distribution
     of Actual Annotation
23
     IEEE/WIC/ACM International Conference of Web Intelligence
Conclusion & Future work
 Performance of Web service Annotation scheme can be tracked
 in the properties of Web service networks models.
An efficient matching scheme eliminates or at least minimizes
     deviation from small-worldness conditions , shows strong negative
     correlation degree and follows scale-free model.
    A major threat :
      Network theories are incomplete : e.g. emergence of power-laws is so
       normal to rely on !
      Evaluated dataset may not represent the model governing whole picture

     Future work:
      Benchmarking other WS annotation & matching methods
      Investigating other network properties
24                                                                  22-27 Aug 2011
      IEEE/WIC/ACM International Conference of Web Intelligence
Thanks !
            Grateful to have your Questions ,
              Critics and Suggestions? 

                                      SHAHABM@KTH.SE



25                                                               22-27 Aug 2011
     IEEE/WIC/ACM International Conference of Web Intelligence
Backup Slides



     IEEE/WIC/ACM International Conference of Web
26   Intelligence                                   22-27 Aug 2011
What Is Going To Be Annotated?
Note: We annotate ONLY basic elements of Web service input and
 output parameter (message part names and XML Scheme basic
 element names).

WSDL                            Semantic Annotation                  Ontology
<wsdl:types>
                                                                       Address
 <complexType name="Address">
  <sequence>
                                                        hasZipCode         hasCityName
      ……
      <element name="Zip" type="string“/>
       …..                                                 ZipCode
      <element name="City" type="string“/>
  </sequence>
 </complexType>
(…)                                                                      CityName
</wsdl:types>
       IEEE/WIC/ACM   International Conference of Web
       Intelligence                                                       22-27 Aug 2011
27
Example of Generated Ontology
                                   Input Terms: “userId”,” username”,“Zip”,“addr_line”,
                                   “userPostalAddress”,“online_usr”,….
     OnlineUser
               isSubClassOf

                                       hasAddress
                      User
                                                          PostalAddress
       hasName                hasIdentifier                           isSubClassOf

                                                                   Address   hasAddressLine
       UserName
                                UserIdentifier        hasZipCode


          PostalCode                                  ZipCode             AddressLine
                                   isSynonymOf
       IEEE/WIC/ACM International Conference of Web
28     Intelligence                                                           22-27 Aug 2011

More Related Content

PDF
Pierre lévy architecture of a semantic networking language
PDF
РИФ 2016, Как обнять 2 000 коллег за день и не умереть от усталости
PPTX
Digital Must Dos For 2010
PDF
РИФ 2016, Динамическое пакетирование - новые возможности дистрибуции туристич...
PPS
Rosja -soczi
PPS
2010 protectyourspine
PPT
Creating Adoption & Orphan Care Culture in Your Church
PDF
РИФ 2016, MEGOGO телевидение в цифровой среде
Pierre lévy architecture of a semantic networking language
РИФ 2016, Как обнять 2 000 коллег за день и не умереть от усталости
Digital Must Dos For 2010
РИФ 2016, Динамическое пакетирование - новые возможности дистрибуции туристич...
Rosja -soczi
2010 protectyourspine
Creating Adoption & Orphan Care Culture in Your Church
РИФ 2016, MEGOGO телевидение в цифровой среде

Viewers also liked (20)

PPT
Set 11 prepositions and prepositional phrases
PDF
РИФ 2016, Клиника К+31: Как продавать частную медицину? Инструменты, ошибки и...
PPS
Brazil sao paolo
PDF
РИФ 2016, Массовое обучение и применение моделей машинного обучения
PDF
Секретный рецепт успешного СЕО: Актуальные техники для долгосрочного получени...
PDF
РИФ 2016, Практико­ориентированное обучение, профессиональные компетенции дру...
PPS
Promi karikaturen
PDF
РИФ 2016, Вредные SEO-советы или ТОП-10 отговорок SEO-агентств
PPT
26 viliavin-optimizatoin2010 продвижение на запад
PPS
η βροχή στη ζωγραφική
PDF
РИФ 2016, История .su, прошлое и настоящее, перспективы роста
PPT
@Aetsa taller documentación.ii jornadas red agencias
PDF
! Rif13.17apr s9--chistov-подробно описан процесс размденеия медийки
PDF
РИФ 2016, Performance-инструменты для ecommerce-клиентов
PDF
איך להפיק את המיטב מהערכת ביצועים 2
PDF
РИФ 2016, Брендинг регионов/территорий в социальных сетях посредством продвиж...
PPSX
03 New Linked In Profile Presentation112010
PPT
Portfolio
PPS
2014 06 19_foto_dal_mondo_-_la
PDF
РИФ 2016, CRM – только инструмент? Или как открыть консервную банку руками, к...
Set 11 prepositions and prepositional phrases
РИФ 2016, Клиника К+31: Как продавать частную медицину? Инструменты, ошибки и...
Brazil sao paolo
РИФ 2016, Массовое обучение и применение моделей машинного обучения
Секретный рецепт успешного СЕО: Актуальные техники для долгосрочного получени...
РИФ 2016, Практико­ориентированное обучение, профессиональные компетенции дру...
Promi karikaturen
РИФ 2016, Вредные SEO-советы или ТОП-10 отговорок SEO-агентств
26 viliavin-optimizatoin2010 продвижение на запад
η βροχή στη ζωγραφική
РИФ 2016, История .su, прошлое и настоящее, перспективы роста
@Aetsa taller documentación.ii jornadas red agencias
! Rif13.17apr s9--chistov-подробно описан процесс размденеия медийки
РИФ 2016, Performance-инструменты для ecommerce-клиентов
איך להפיק את המיטב מהערכת ביצועים 2
РИФ 2016, Брендинг регионов/территорий в социальных сетях посредством продвиж...
03 New Linked In Profile Presentation112010
Portfolio
2014 06 19_foto_dal_mondo_-_la
РИФ 2016, CRM – только инструмент? Или как открыть консервную банку руками, к...
Ad

Similar to Wi iat-bootstrapping the analysis of large-scale web service networks-v3 (20)

PDF
12110 computer networks
DOCX
COMPUTER NETWORKS LESSON PLAN PROOFS (how to make lesson plan proof)
PDF
Introduction To Computer Networks
PPTX
Internet Protocols
PDF
Understanding20and20 simulating20the20iec206185020standard
PDF
Hva er SOA og Web services?
PDF
Network protocols
PPTX
CCNA 200-301 Chapter 1-Introduction to TCP IP Networking.pptx
PDF
Wireless network basics
PPT
Ncsweek2 osi model
PDF
Day+3+Slides+-+OSI+Model+&+TCP-IP+Suite.pdf
PPTX
Architectures and buildings
PPT
About M2M Standards
PPTX
Basic networking 07-2012
PPT
Layer_arc_and_OSI_MODEL.ppt
PDF
Overview Of I E C61850 Presentation..... W S M
PPTX
OSI MODEL
PPT
About M2M standards and their possible extensions
PPTX
OSI Model
PDF
Introduction to Networks_v0.2
12110 computer networks
COMPUTER NETWORKS LESSON PLAN PROOFS (how to make lesson plan proof)
Introduction To Computer Networks
Internet Protocols
Understanding20and20 simulating20the20iec206185020standard
Hva er SOA og Web services?
Network protocols
CCNA 200-301 Chapter 1-Introduction to TCP IP Networking.pptx
Wireless network basics
Ncsweek2 osi model
Day+3+Slides+-+OSI+Model+&+TCP-IP+Suite.pdf
Architectures and buildings
About M2M Standards
Basic networking 07-2012
Layer_arc_and_OSI_MODEL.ppt
Overview Of I E C61850 Presentation..... W S M
OSI MODEL
About M2M standards and their possible extensions
OSI Model
Introduction to Networks_v0.2
Ad

Recently uploaded (20)

PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Computing-Curriculum for Schools in Ghana
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Cell Types and Its function , kingdom of life
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Classroom Observation Tools for Teachers
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Institutional Correction lecture only . . .
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
master seminar digital applications in india
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Basic Mud Logging Guide for educational purpose
PDF
Sports Quiz easy sports quiz sports quiz
PDF
RMMM.pdf make it easy to upload and study
PDF
Pre independence Education in Inndia.pdf
Microbial diseases, their pathogenesis and prophylaxis
Renaissance Architecture: A Journey from Faith to Humanism
102 student loan defaulters named and shamed – Is someone you know on the list?
Computing-Curriculum for Schools in Ghana
Abdominal Access Techniques with Prof. Dr. R K Mishra
O5-L3 Freight Transport Ops (International) V1.pdf
Cell Types and Its function , kingdom of life
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Classroom Observation Tools for Teachers
TR - Agricultural Crops Production NC III.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Institutional Correction lecture only . . .
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
master seminar digital applications in india
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Basic Mud Logging Guide for educational purpose
Sports Quiz easy sports quiz sports quiz
RMMM.pdf make it easy to upload and study
Pre independence Education in Inndia.pdf

Wi iat-bootstrapping the analysis of large-scale web service networks-v3

  • 1. Bootstrapping the Analysis of Large-scale Web Service Networks Shahab Mokarizadeh, Royal Institute of Technology , Sweden Peep Kungas, Tartu University, Estonia Mihhail Matskin, Royal Institute of Technology, Sweden IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
  • 2. Background Why web service analysis? Identifying Missing but Valuable Web service (to be implemented)  Discovering correlation among public , governmental and private sector web services Discovery of the most/least exploited concept(s)s, web service(s), we service provider(s) ….. Initial challenge? Vast majority of available services are not semantically annotated or even come with any sort of documentation ! 2 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 3. Analysis Roadmap • Generate Reference Ontology • Initially only WSDL web services • Web service Annotation • Web Service Matching & Network generation • Apply Social Network Analysis Algorithms • Information Diffusion among Web service communities • Analysis the Impact of Services /Concept on other services or concepts 3 IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
  • 4. Remind: WSDL Structure Image from : Web Services and Security,1/17/2006 ,Marco Cova 4 IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
  • 5. Ontology Learning from Information Elicitation WSDL Interfaces1 Term Extraction Syntactic Refinement Ontology Discovery Ontology Learning Input: Pattern-based - Message Part names of input/output Semantic Analysis parameters Term Disambiguation - XML Schema leaf element names of complex types Class and Relation Determination Ontology Organization [1] ”Ontology Learning for Cost-Effective Large-scale Semantic Adding Relations Annotation of XML Schemas and Web Service Interfaces". in Porc. EKAW 2010, LNAI 6317,pp.401-410, 2010 Reference 5 Ontology IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
  • 6. Annotation Heuristics2 entity_reference ← synset{…} Concept in Ontology Instances in Ontology (terms) Example: Password ← {password, pwd, strPassword, authPassword, pass} Address ← {addr, address1, postal_address} [2] P.Küngas, and M. Dumas.“Cost-Effective Semantic Annotation of XML Schemas and Web Service Interfaces”. Proc. IEEE Conference on Services Computing, 2009, pp.372-379, 6 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 7. Web service Matching Scheme Matching of basic elements of Web service input and output parameters (ontological instances) Web service matching Simplified as Instance Matching Rule based matching scheme. - A matching rule reveals existence of kind of semantic relation between the given two instances. 7 IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
  • 8. Instance Matching Rules (1) Rule-1: Same concept . Example: (addr, addr_line) : {addr, addr_line} instanceOf Address . Rule-2: Synonyms Concepts . Example: ( loc, place) {loc} instanceOf Location , {place} instanceOf Place Place isSynonymOf Location Rule-3: Subcalss Concepts. Example: (loc, city): {loc} instanceOf Location, {city} instanceOf City, City isSubClassOf Location 8 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 9. Instance Matching Rules (2) Rule-4: Rule 2 + Rule 3 . Example : (bidUId, id) {bidUId} instanceOf BidUniqueCode, {id} instanceOf ContractIdentifier BidUniqueCode isSynonymOf ContractIdentifier Rule-5: Interrelated by an ontological relations (other than isSynonymOf): Example : Person hasPropertyXXX FirstName. 9 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 10. Evaluate Matching Scheme -1 1- Classical Approach (Precision, Recall, F-measure) 1. Need a Golden Annotation /Ontology to compare with . 2. Identify :  True Positives (TP) : the common annotations between golden and generated ontology  False Positives (FP) : annotations made only by generated ontology  False Negatives (FN): annotations made by golden ontology but not discovered by the generated ontology). 3. Compute: 10 IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
  • 11. Evaluate Matching Scheme - 2 2-Tracking Performance of Matching Scheme in Network Model • Generate Semantic Network model out of Annotated Web service corpus. • Track the performance of exploited Annotation & Matching scheme in the network properties .Web service (WSDL) networks (in small size) observed to exhibit: • Small-worldness model  Scale free model  Correlation degree on nodes ? 11 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 12. Web service Network Models 2-Projecting Matching Scheme Accuracy in Network Model Operations Parameters Concepts Semantic Network WS1 - WS3 : Web services WS1 P1 C1 C1 OP1 OP1 - OP3 : Web service P2 Operations C2 WS2 C2 C3 P3 OP2 C3 P1 - P6 : Basic Elements of Input P4 / Output Parameters C5 C4 C4 WS3 P5 C1 – C5 : Ontological Concepts OP3 C5 P6 Representing the Parameter Annotated Web service 12 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 13. Evaluating Network Properties Small Worldness Small world networks are networks with the following characteristics: 1. LRandom ≤ LActual L: Shortest Path Length 2. CRandom << CActual C: Clustering Coefficient Sindex : Small worldness Index In other words: > 1, λ > 1, Sindex > 1 Small-worldness scales linearly with network size. 13 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 14. Evaluating Network Properties Scale free Networks  Scale free Networks:  Fitted to power-law function y  c.x Many nodes with few links # of nodes with M links (log) A few nodes with many links # of links (M) (log) 14 IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011
  • 15. Evaluating Network Properties Assortativity of Node Degree (Correlation Degree on Nodes)  Positive Correlation : if vertices with high number of connection tend to be connected with other nodes which also have many links . Observed in social networks : e.g. network of actors.  Negative Correlation: if the preference is to attach to those having small quantity of connection. Observed in technological and biological networks : e.g. Internet, protein interactions. 15 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 16. Experimental Datasets  SOATrader dataset: 1,000,000 terms form SOATrader collection of 15000 WSDL s collected from different repositories in the Web between 2005-2007. SOATarder: ( http://www.soatrader.com/web-services) .  ASSAM dataset3: 146 WSDLs collected by Hess et. al and annotated by ASSAM tools .We use all unique terms (appr. 375 ) with any frequency from this collection. ASSAM : http://www.andreas-hess.info/projects/annotator/ [3] A.Heß, N.Kushmeric, ”Machine Learning for Annotating Semantic Web services “,AAAI Spring Symposium Semantic Web Services, 2004 16 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 17. Golden Ontology  SOATrader dataset: The golden annotation is handcrafted by authors based on top 2000 recurrent terms.  ASSAM : Exploit the golden annotation developed by ASSAM developers and exploited as reference ontology in their experiment with ASSAM Web service annotation tool. 17 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 18. Evaluation Result - 1 Precision, Recall, F-Measure 0.6 0.5 0.4 0.3 0.2 Rule-1 0.1 Rules 1-4 0 Rules 1-5 Recall Precision Recall Precision F-Measure F-Measure Top2000 ASSAM 18 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 19. Dataset for Network Evaluation Ideal :Use all dataset of WSDL/XSD elements (approx. 1,000,000 terms) from SOATrader collection (appr. 1 million term) and ASSAM collection ( appr. 10000 terms) Problem with Large dataset: - The larger is dataset, the bigger will be ontology, the harder will be verifying and enhancing the quality of annotation - Not Cost Effective (human and computation cost) nor Scalable for analysis purpose. Proposal: limit SOATarder experimental dataset to the following four arbitrary chosen thresholds ( minimum frequency of occurrence of term) 10, 15, 20 and 25( h10, h15, h20, h25 ) , covering 30000 (unique) most recurrent terms. 19 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 20. Annotation Progress h25 h20 h15 h10 Learned ontology size 4523 5614 7378 11610 Annotated elements 588057 596625 621336 663618 Total elements 998916 998916 998916 998916 Percentage of total 59% 60% 62% 66% 20 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 21. Analysis of Small Worldness Dataset Networks L C Sindex Entire Syntactic Actual 3.283 0.2968 591.08 SOATarder Random-ER 3.9229 0.00062 h 25 Generated Actual 2.4256 0.259 7.5769 Random-ER 2.4756 0.0348 h20 Generated Actual 2.3882 0.2811 8.8148 Random-ER 2.4851 0.0331 h15 Generated Actual 2.3724 0.2805 8.2753 Random-ER 2.3396 0.0334 h10 Generated Actual 2.5322 0.2449 18.2709 Random-ER 2.7662 0.0146 Top2000 Golden Actual 2.1895 0.3761 2.8404 Random-ER 1.8852 0.1146 Generated Actual 2.08475 0.3209 3.3878 Random-ER 2.0667 0.0939 ASSAM Golden Actual 4.5653 0.2147 3.1464 Random-ER 3.546 0.05304 Generated Actual 3.0592 0.4803 21.4835 Rule. 1 Random-ER 3.8451 0.0281 21 Generated Actual 2.5732 0.4057 8.5288 Rules .1-4 Random-ER 3.1267 0.0578
  • 22. Analysis of Scale-free Properties & Correlation Degree Category Networks Power-law Degree #Nodes Degree Exponent Correlation Entire Syntactic 1.3722 67622 -0.0413 h25 Generated 1.1945 2086 -0.1993 Random Annotation 0.6332 2086 0.019 h20 Generated 1.1977 2394 -0.2093 h15 Generated 1.1448 3239 -0.2222 h10 Generated 1.2316 4050 -0.1895 Top2000 Golden 1.1504 856 -0.2238 Generated 1.1483 936 -0.2137 Syntactic 1.1653 828 -0.2229 ASSAM Golden 1.5346 170 -0.3079 Generated- Rule. 1 1.5574 413 0.3642 Generated - Rules .1-4 1.4566 217 0.041 Random Annotation 1.0755 170 0.1151 22 Syntactic 1.6105 886 0.194
  • 23. Plot of Degree Distribution Out-degree Distribution of Random Annotation Out-degree Distribution of Actual Annotation 23 IEEE/WIC/ACM International Conference of Web Intelligence
  • 24. Conclusion & Future work  Performance of Web service Annotation scheme can be tracked in the properties of Web service networks models. An efficient matching scheme eliminates or at least minimizes deviation from small-worldness conditions , shows strong negative correlation degree and follows scale-free model.  A major threat :  Network theories are incomplete : e.g. emergence of power-laws is so normal to rely on !  Evaluated dataset may not represent the model governing whole picture Future work:  Benchmarking other WS annotation & matching methods  Investigating other network properties 24 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 25. Thanks ! Grateful to have your Questions , Critics and Suggestions?  SHAHABM@KTH.SE 25 22-27 Aug 2011 IEEE/WIC/ACM International Conference of Web Intelligence
  • 26. Backup Slides IEEE/WIC/ACM International Conference of Web 26 Intelligence 22-27 Aug 2011
  • 27. What Is Going To Be Annotated? Note: We annotate ONLY basic elements of Web service input and output parameter (message part names and XML Scheme basic element names). WSDL Semantic Annotation Ontology <wsdl:types> Address <complexType name="Address"> <sequence> hasZipCode hasCityName …… <element name="Zip" type="string“/> ….. ZipCode <element name="City" type="string“/> </sequence> </complexType> (…) CityName </wsdl:types> IEEE/WIC/ACM International Conference of Web Intelligence 22-27 Aug 2011 27
  • 28. Example of Generated Ontology Input Terms: “userId”,” username”,“Zip”,“addr_line”, “userPostalAddress”,“online_usr”,…. OnlineUser isSubClassOf hasAddress User PostalAddress hasName hasIdentifier isSubClassOf Address hasAddressLine UserName UserIdentifier hasZipCode PostalCode ZipCode AddressLine isSynonymOf IEEE/WIC/ACM International Conference of Web 28 Intelligence 22-27 Aug 2011