SlideShare a Scribd company logo
seekda‘s Web Service Search Engine




                                                         Nathalie Steinmetz
                                                                 seekda GmbH




                                                                           1
© Copyright 2012 SEEKDA GmbH – www.seekda.com
seekda Web Service Search Engine




                                                                               2
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Motivation

       “Web of services”
           Growing amount of public services & data on the Web
           Problem: How do I find the service I need?
              General search engine: services hard to identify, not much information
               on results page
              Specific portals: access to restricted sets of registered and editorially
               maintained services
       Use semantic technologies for better search experience
           No to heavy-weight, expressive semantic web service languages
            such as OWL-S or WSML
           Yes to simple light-weight semantic annotations in RDF
            Scalability!
                                                                                       3
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Outline

       Web Service search engine - basics
           Focused Crawling
           WSDL-based services
           Web APIs

       Seekda‘s search engine & experimental prototype

       Crowdsourcing Web Service annotations
           Web Service Annotation wizard
           Amazon Mechanical Turk crowdsourcing

       Service ontologies

© Copyright 2012 SEEKDA GmbH – www.seekda.com
Service Location

       Locating Web Services on the Web (Approach adopted by
        European projects Service-Finder & SOA4All)
           Crawling the Web for services
           Aggregate information
           Annotate services

       Supported services:
           WSDL descriptions
           Web APIs (a.k.a. RESTful services)




                                                                5
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Service Crawler Architecture


                        Crawl Operator
                                                           Collecting Seeds
                            Configuration & Monitoring




                                                                Crawling

                                                                                        RDF
                                                                                      meta-data

                                                                Data
                                                           Post-Processing




                                                         ARCs                 Index


                                                                                                  6
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Crawling the Web for Services

       Basic crawling process:
             Start with a set of seed URLs
             Check whether a page should be fetched or not
             Fetch the document the URL points to
             Extract links from the fetched document
             Decide whether or not to store fetched documents
             Feed crawler queues with newly extracted links
             Assign costs/priorities to single URLs and queues




                                                                            7
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Focused Crawling Techniques

       Seed Collection
           Collecting seeds from specialized portals
           Reuse known Web Service descriptions and related documents
       URL Scheduling
           Use clever means to prioritize URLs to focus the crawls to the relevant part of
            the Web
           Assign costs that influence the priority of a URL in a queue
           Based on:
              Building term vectors of pages to assess similarity to WS domain
              URL characteristics
       Queue Scheduling
           One queue per host
           Prioritize queues with low-cost URLs

                                                                                              8
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Identify WSDLs and Related Information

       WSDL identification
           Check whether a fetched page is XML and valid WSDL

       Related documents identification
           Definition of related document
              Inlink to the WSDL
              Outlink from the WSDL
              Associated by term vector similarity
           Task split between crawl run-time and post-processing of the
            crawl data
           Task implies the deeper crawling of service provider domains

                                                                            9
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Unique Service Objects

       Building unique service objects
           Collect all similar WSDLs  deduplication
              One service = all WSDLs with same provider and service
              Example:
                   Unique Service: http://seekda.com/providers/cdyne.com/IP2Geo
                   Endpoint: http://ws.cdyne.com/ip2geo/ip2geo.asmx
                   Provider: cdyne.com
                   Service: IP2Geo
                   WSDLs:
                    http://ws.cdyne.com/ip2geo/ip2geo.asmx?wsdl
                    http://miki2005.uda.ad/p1net/Web%20References/com.cdyne.ws/ip2geo.wsdl
                    ...

       Create uniqe service identifiers:
           http://seekda.com/providers/<providerName>/<serviceName>
       Assemble related information
                                                                                             10
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Search Results




                                                            11
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Service Overview




                                                               12
© Copyright 2012 SEEKDA GmbH – www.seekda.com
seekda Web Service Search Engine




                                                                               13
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Why crawl for Web APIs?

       Significant growth of Web APIs
           > 5,400 Web APIs on ProgrammableWeb (including SOAP and
            REST APIs) [end of 2009: ca. 1,500 Web APIs]
           > 6,500 Mashups on ProgrammableWeb (combining Web APIs
            from one or more sources)
       SOAP services are only a small part of the overall available
        public services




                                                                       14
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Web API – Example (1/3)




                                                                      15
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Web API – Example (2/3)




                                                                      16
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Web API – Example (3/3)

       Problem:
           Web APIs are
            described by regular
            HTML pages
           No standardized
            structure that helps
            with the
            identification




                                                                      17
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Web API Identification

       Solution: Crawl for Web APIs

           Approach 1: Manual Feature Identification Approach
              Taking into account HTML structure (e.g., title, mark-up), syntactical
               properties of used language (e.g., camel-cased words), and link
               properties of pages (ratio external links / internal links)


           Approach 2: Automatic Classification Approach
              Text Classification, supervised learning (Support Vector Machine
               model)
              Training set: APIs from ProgrammableWeb


                                                                                        18
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Unique Service Objects – Web APIs

       Create unique identifiers:
           Again using the provider name (from the Web API homepage)
           We do not know the service name  hash value of URL instead
           http://seekda.com/providers/<providerName>/<hashValueOfURL
            >

       But: still needed human confirmation to be sure




                                                                               19
© Copyright 2012 SEEKDA GmbH – www.seekda.com
New Search Engine Prototype




                                                                          20
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Prototype – User Contributions

       Web API – yes/no: confirmation from
        human needed!
       Other annotations that help improve
        the search for Web Services
             Categories
             Tags
             Natural Language descriptions
             Cost: Free or paid service




                                                                            21
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Problem - User Contribution

       Problem:
           Users/developers don’t contribute enough
           Hard to motivate them to provide annotations
           Community recognition or peer respect not enough
       Solution: crowdsourcing the annotations, pay people to
        provide annotations
           Use Amazon Mechanical Turk
           Bootstrap annotations quickly and cheap




                                                                         22
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Service Annotation Wizard (1/4)




                                                                             23
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Service Annotation Wizard (2/4)




                                                                             24
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Service Annotation Wizard (3/4)




                                                                             25
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Service Annotation Wizard (4/4)




                                                                             26
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Amazon Mechanical Turk – Iteration 1

                        Number of Submissions               70
                        Reward per task                    $0.10
                        Restrictions                        none

       Annotation Wizard
             Web API Yes/No
             Assign a category
             Assign tags
             Provide a natural language description
             Determine whether page is documentation, pricing or listing
             Rate the service


                                                                              27
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Amazon Mechanical Turk – Iteration 1

       Results
             21 APIs correctly identified as APIs
             28 Web documents (non APIs) identified correctly as non APIs
             49/70 correctly identified (70% accuracy)
             Average task completion time: 2:20 min
       But, only:
           4 well done & complete annotations
           8 acceptable annotations (non complete)




                                                                              28
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Amazon Mechanical Turk – Iterations 2 & 3

                                                Iteration 2   Iteration 3
           Number of Submissions                   100           150
           Reward per task                        $0.20         $0.20
           Restrictions                            yes           yes


       Annotation Wizard
           Removed page type identification & service rating
           For a task to be accepted:
              At least one category must be assigned
              At least 2 tags must be provided
              A meaningful description must be provided


                                                                            29
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Amazon Mechanical Turk – Iteration 2 & 3

       Results Iteration 2 & 3:
           Ca. 80% of documents correctly identified
           Very satisfying annotations
           Average completion time: 2:36 min




                                                                         30
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Amazon Mechanical Turk – Survey

       48 survey submissions
           Female 18, Male 30
           Most popular origins: India (27) and USA (9)
           Popular age groups:
              15-22 (12)
              23-30 (18)
              31-50 (16)
           Most of them worked in some IT profession
              Provided best quality annotations




                                                                              31
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Amazon Mechanical Turk

       Recommendations for further improvement:
             Improve task description, especially ‘what is a Web API’
             Better examples (e.g., hinting what makes a false page false)
             Allow assignment of multiple categories
             Restrict to workers in IT professions?

       Conclusion:
           Very positive results  good way to get quality annotations
           Results will help provide better search experience to users
           Results can be used as positive set for automatic classification


                                                                               32
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Service Ontologies (1/2)




                                                                      33
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Service Ontologies (2/2)




                                                http://www.service-finder.eu/ontologies/ServiceCategories

                                                                                                  34
© Copyright 2012 SEEKDA GmbH – www.seekda.com
Questions?




                                                             35
© Copyright 2012 SEEKDA GmbH – www.seekda.com

More Related Content

DOCX
Kashif Saleem
PDF
SharePoint Fest Chicago 2014 - Anatomy of SharePoint and Office 365 Hybrid De...
PPTX
GAA Presents "goMongo" and HayStack
PDF
X Aware Ajax World V1
PPTX
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
PDF
Putting the "Share" and "Point" back in SharePoint 2013
PDF
Oracle unified directory_11g
PPT
Turbo Enterprise Web 2.0 Ajax World 20081
Kashif Saleem
SharePoint Fest Chicago 2014 - Anatomy of SharePoint and Office 365 Hybrid De...
GAA Presents "goMongo" and HayStack
X Aware Ajax World V1
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
Putting the "Share" and "Point" back in SharePoint 2013
Oracle unified directory_11g
Turbo Enterprise Web 2.0 Ajax World 20081

What's hot (19)

PDF
Tripit Ajaxworld V5
PPTX
Introducing SQL Server Data Services
DOC
Golam Md. Enamul Haque
PPTX
Design a share point 2013 architecture – the basics
PDF
Web 2 0 Data Visualization With Jsf
PPTX
Multiorg Collaboration Using Salesforce S2S
PDF
SharePoint 2010: ECM-ready?
PDF
Blaze Ds Slides
PDF
LinkedIn Data Infrastructure (QCon London 2012)
PPTX
Supporting architecture for office 365 spo
PDF
Talk IT_ Oracle_임기성_110907
PPTX
SQL Azure Federation and Scalability
PPTX
Architectural changes in SharePoint 2013
PDF
List of Top Local Databases used for react native app developement in 2022
PPTX
A Succesful WebCenter Upgrade: What You Need to Know
PDF
Sql azure database under the hood
PPTX
Session 2 Integrating SharePoint 2010 and Windows Azure
PDF
Frank Mantek Google G Data
PDF
All-inclusive insights on Building JavaScript microservices with Node!.pdf
Tripit Ajaxworld V5
Introducing SQL Server Data Services
Golam Md. Enamul Haque
Design a share point 2013 architecture – the basics
Web 2 0 Data Visualization With Jsf
Multiorg Collaboration Using Salesforce S2S
SharePoint 2010: ECM-ready?
Blaze Ds Slides
LinkedIn Data Infrastructure (QCon London 2012)
Supporting architecture for office 365 spo
Talk IT_ Oracle_임기성_110907
SQL Azure Federation and Scalability
Architectural changes in SharePoint 2013
List of Top Local Databases used for react native app developement in 2022
A Succesful WebCenter Upgrade: What You Need to Know
Sql azure database under the hood
Session 2 Integrating SharePoint 2010 and Windows Azure
Frank Mantek Google G Data
All-inclusive insights on Building JavaScript microservices with Node!.pdf
Ad

Similar to seekda's Web Service search engine (20)

PPTX
AAAI2012 - Crowd Sourcing Web Service Annotations
PPTX
Crowd Sourcing Web Service Annotations
PDF
Oracle ADF Architecture TV - Design - Service Integration Architectures
PPTX
W8/WP8 App Dev for SAP, Part 1B: Service Generation with NetWeaver Gateway Fr...
PDF
Autodesk Technical Webinar: SAP NetWeaver Gateway Part 1
PDF
OOW 2012: Integrate Cloud Applications with Oracle SOA Suite
PPTX
W8/WP8 App Dev for SAP, Part 1A: Service Development with NetWeaver Gateway S...
PDF
Standard Issue: Preparing for the Future of Data Management
PDF
Oracle ADF Architecture TV - Design - ADF Service Architectures
PPT
Seamless Integrations between WebCenter Content, Site Studio, and WebCenter S...
PPTX
Elevate MongoDB with ODBC/JDBC
PDF
Con8439 fusion apps customs to ebs
PDF
黑豹 ch4 ddd pattern practice (2)
ODP
Application development using Zend Framework
PDF
Fusion app integration_con8685_pdf_8685_0001
PDF
From Requirements Management to Release with Git for Android System
PDF
No SQL at The Guardian
PPTX
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
KEY
Administration for Oracle ADF Applications
KEY
Administration von ADF Anwendungen
AAAI2012 - Crowd Sourcing Web Service Annotations
Crowd Sourcing Web Service Annotations
Oracle ADF Architecture TV - Design - Service Integration Architectures
W8/WP8 App Dev for SAP, Part 1B: Service Generation with NetWeaver Gateway Fr...
Autodesk Technical Webinar: SAP NetWeaver Gateway Part 1
OOW 2012: Integrate Cloud Applications with Oracle SOA Suite
W8/WP8 App Dev for SAP, Part 1A: Service Development with NetWeaver Gateway S...
Standard Issue: Preparing for the Future of Data Management
Oracle ADF Architecture TV - Design - ADF Service Architectures
Seamless Integrations between WebCenter Content, Site Studio, and WebCenter S...
Elevate MongoDB with ODBC/JDBC
Con8439 fusion apps customs to ebs
黑豹 ch4 ddd pattern practice (2)
Application development using Zend Framework
Fusion app integration_con8685_pdf_8685_0001
From Requirements Management to Release with Git for Android System
No SQL at The Guardian
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Administration for Oracle ADF Applications
Administration von ADF Anwendungen
Ad

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Getting Started with Data Integration: FME Form 101
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
1. Introduction to Computer Programming.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPT
Teaching material agriculture food technology
PPTX
Tartificialntelligence_presentation.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Spectroscopy.pptx food analysis technology
Spectral efficient network and resource selection model in 5G networks
Per capita expenditure prediction using model stacking based on satellite ima...
MIND Revenue Release Quarter 2 2025 Press Release
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Getting Started with Data Integration: FME Form 101
SOPHOS-XG Firewall Administrator PPT.pptx
Encapsulation_ Review paper, used for researhc scholars
A comparative analysis of optical character recognition models for extracting...
1. Introduction to Computer Programming.pptx
Programs and apps: productivity, graphics, security and other tools
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Teaching material agriculture food technology
Tartificialntelligence_presentation.pptx

seekda's Web Service search engine

  • 1. seekda‘s Web Service Search Engine Nathalie Steinmetz seekda GmbH 1 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 2. seekda Web Service Search Engine 2 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 3. Motivation  “Web of services”  Growing amount of public services & data on the Web  Problem: How do I find the service I need?  General search engine: services hard to identify, not much information on results page  Specific portals: access to restricted sets of registered and editorially maintained services  Use semantic technologies for better search experience  No to heavy-weight, expressive semantic web service languages such as OWL-S or WSML  Yes to simple light-weight semantic annotations in RDF   Scalability! 3 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 4. Outline  Web Service search engine - basics  Focused Crawling  WSDL-based services  Web APIs  Seekda‘s search engine & experimental prototype  Crowdsourcing Web Service annotations  Web Service Annotation wizard  Amazon Mechanical Turk crowdsourcing  Service ontologies © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 5. Service Location  Locating Web Services on the Web (Approach adopted by European projects Service-Finder & SOA4All)  Crawling the Web for services  Aggregate information  Annotate services  Supported services:  WSDL descriptions  Web APIs (a.k.a. RESTful services) 5 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 6. Service Crawler Architecture Crawl Operator Collecting Seeds Configuration & Monitoring Crawling RDF meta-data Data Post-Processing ARCs Index 6 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 7. Crawling the Web for Services  Basic crawling process:  Start with a set of seed URLs  Check whether a page should be fetched or not  Fetch the document the URL points to  Extract links from the fetched document  Decide whether or not to store fetched documents  Feed crawler queues with newly extracted links  Assign costs/priorities to single URLs and queues 7 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 8. Focused Crawling Techniques  Seed Collection  Collecting seeds from specialized portals  Reuse known Web Service descriptions and related documents  URL Scheduling  Use clever means to prioritize URLs to focus the crawls to the relevant part of the Web  Assign costs that influence the priority of a URL in a queue  Based on:  Building term vectors of pages to assess similarity to WS domain  URL characteristics  Queue Scheduling  One queue per host  Prioritize queues with low-cost URLs 8 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 9. Identify WSDLs and Related Information  WSDL identification  Check whether a fetched page is XML and valid WSDL  Related documents identification  Definition of related document  Inlink to the WSDL  Outlink from the WSDL  Associated by term vector similarity  Task split between crawl run-time and post-processing of the crawl data  Task implies the deeper crawling of service provider domains 9 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 10. Unique Service Objects  Building unique service objects  Collect all similar WSDLs  deduplication  One service = all WSDLs with same provider and service  Example:  Unique Service: http://seekda.com/providers/cdyne.com/IP2Geo  Endpoint: http://ws.cdyne.com/ip2geo/ip2geo.asmx  Provider: cdyne.com  Service: IP2Geo  WSDLs: http://ws.cdyne.com/ip2geo/ip2geo.asmx?wsdl http://miki2005.uda.ad/p1net/Web%20References/com.cdyne.ws/ip2geo.wsdl ...  Create uniqe service identifiers:  http://seekda.com/providers/<providerName>/<serviceName>  Assemble related information 10 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 11. Search Results 11 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 12. Service Overview 12 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 13. seekda Web Service Search Engine 13 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 14. Why crawl for Web APIs?  Significant growth of Web APIs  > 5,400 Web APIs on ProgrammableWeb (including SOAP and REST APIs) [end of 2009: ca. 1,500 Web APIs]  > 6,500 Mashups on ProgrammableWeb (combining Web APIs from one or more sources)  SOAP services are only a small part of the overall available public services 14 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 15. Web API – Example (1/3) 15 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 16. Web API – Example (2/3) 16 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 17. Web API – Example (3/3)  Problem:  Web APIs are described by regular HTML pages  No standardized structure that helps with the identification 17 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 18. Web API Identification  Solution: Crawl for Web APIs  Approach 1: Manual Feature Identification Approach  Taking into account HTML structure (e.g., title, mark-up), syntactical properties of used language (e.g., camel-cased words), and link properties of pages (ratio external links / internal links)  Approach 2: Automatic Classification Approach  Text Classification, supervised learning (Support Vector Machine model)  Training set: APIs from ProgrammableWeb 18 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 19. Unique Service Objects – Web APIs  Create unique identifiers:  Again using the provider name (from the Web API homepage)  We do not know the service name  hash value of URL instead  http://seekda.com/providers/<providerName>/<hashValueOfURL >  But: still needed human confirmation to be sure 19 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 20. New Search Engine Prototype 20 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 21. Prototype – User Contributions  Web API – yes/no: confirmation from human needed!  Other annotations that help improve the search for Web Services  Categories  Tags  Natural Language descriptions  Cost: Free or paid service 21 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 22. Problem - User Contribution  Problem:  Users/developers don’t contribute enough  Hard to motivate them to provide annotations  Community recognition or peer respect not enough  Solution: crowdsourcing the annotations, pay people to provide annotations  Use Amazon Mechanical Turk  Bootstrap annotations quickly and cheap 22 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 23. Service Annotation Wizard (1/4) 23 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 24. Service Annotation Wizard (2/4) 24 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 25. Service Annotation Wizard (3/4) 25 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 26. Service Annotation Wizard (4/4) 26 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 27. Amazon Mechanical Turk – Iteration 1 Number of Submissions 70 Reward per task $0.10 Restrictions none  Annotation Wizard  Web API Yes/No  Assign a category  Assign tags  Provide a natural language description  Determine whether page is documentation, pricing or listing  Rate the service 27 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 28. Amazon Mechanical Turk – Iteration 1  Results  21 APIs correctly identified as APIs  28 Web documents (non APIs) identified correctly as non APIs  49/70 correctly identified (70% accuracy)  Average task completion time: 2:20 min  But, only:  4 well done & complete annotations  8 acceptable annotations (non complete) 28 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 29. Amazon Mechanical Turk – Iterations 2 & 3 Iteration 2 Iteration 3 Number of Submissions 100 150 Reward per task $0.20 $0.20 Restrictions yes yes  Annotation Wizard  Removed page type identification & service rating  For a task to be accepted:  At least one category must be assigned  At least 2 tags must be provided  A meaningful description must be provided 29 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 30. Amazon Mechanical Turk – Iteration 2 & 3  Results Iteration 2 & 3:  Ca. 80% of documents correctly identified  Very satisfying annotations  Average completion time: 2:36 min 30 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 31. Amazon Mechanical Turk – Survey  48 survey submissions  Female 18, Male 30  Most popular origins: India (27) and USA (9)  Popular age groups:  15-22 (12)  23-30 (18)  31-50 (16)  Most of them worked in some IT profession  Provided best quality annotations 31 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 32. Amazon Mechanical Turk  Recommendations for further improvement:  Improve task description, especially ‘what is a Web API’  Better examples (e.g., hinting what makes a false page false)  Allow assignment of multiple categories  Restrict to workers in IT professions?  Conclusion:  Very positive results  good way to get quality annotations  Results will help provide better search experience to users  Results can be used as positive set for automatic classification 32 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 33. Service Ontologies (1/2) 33 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 34. Service Ontologies (2/2) http://www.service-finder.eu/ontologies/ServiceCategories 34 © Copyright 2012 SEEKDA GmbH – www.seekda.com
  • 35. Questions? 35 © Copyright 2012 SEEKDA GmbH – www.seekda.com