SlideShare a Scribd company logo
Open Semantic Annotation  an experiment with BioMoby Web Services Benjamin Good, Paul Lu,  Edward Kawas,  Mark Wilkinson University of British Columbia Heart + Lung Research Institute St. Paul’s Hospital
The Web contains lots of things
But the Web doesn’t know what they ARE text/html video/mpeg image/jpg audio/aiff
The Semantic Web It’s A Duck
Semantic Web Reasoning Logically… It’s A Duck Defining the world by its properties helps me find the KINDS of things I am looking for  Add properties to the things we are describing Walks Like a Duck Quacks Like a Duck Looks Like a Duck
Asserted vs. Reasoned Semantic Web Catalog/ ID Selected Logical Constraints (disjointness,  inverse, …)  Terms/ glossary Thesauri “ narrower term” relation Formal is-a Frames (Properties) Informal is-a Formal instance Value  Restrs. General Logical constraints Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann,  McGuinness, Uschold, Welty; –  updated by McGuinness. Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
Who assigns these properties? Works ~well … but doesn’t scale
When we say “Web”  we mean “Scale”
Natural Language Processing Scales Well… Works!!  … Sometimes… … Sort of….
Natural Language Processing Problem #1 Requires text to get the process started Problem #2 Low accuracy means it can only support, not replace, manual annotation
Web 2.0 Approach OPEN to all Web users (Scale!) Parallel, Distributed, “ Human Computation”
Human Computation Getting  lots of people  to solve problems that are   difficult   for computers. (term introduced by Luis Von Ahn, Carnegie Mellon University)
Example: Image Annotation
ESP Game results >4 million images labeled >23,000 players Given 5,000  players online simultaneously, could label all of the images accessible to Google in a month  See the “Google image labeling game”… Luis Von Ahn and Laura Dabbish (2004)  “Labeling images with a computer game” ACM Conference on Human Factors in Computing Systems (CHI)
Social Tagging Accepted Widely applied  Passive volunteer annotation. Del.icio.us  2006 surpassed 1 million users Connotea, CiteUlike, etc. See also our ED2Connotea extension This is a picture of Japanese traditional wagashi sweets called “seioubo” which is modeled after a peach
BUSTED! I just pulled a bunch of Semantics out of my Seioubo!
BUSTED! This is a picture of Japanese traditional wagashi sweets called “seioubo” which is modeled after a peach This is a totally sweet picture of peaches grown in the city of Seioubo, in the Wagashi region of Japan
So tagging isn’t enough… We need properties, but the properties need to be semantically-grounded in order to enable reasoning (and this ain’t gonna happen through NLP because there is even  less  context in tags!)
Social Semantic Tagging Q1:   Can we design interfaces that assist “the masses” to derive their tags from controlled vocabularies (ontologies)? Q2:  How well do “the masses” do when faced with such an interface?  Can this data be used “rigorously” for e.g. logical reasoning? Q3:   “The masses” seem to be good at tagging things like pictures… no brainer!  How do they do at tagging more complex things like bioinformatics Web Services?
Context:  BioMoby Web Services BioMoby is a Semantic Web Services framework in which the data-objects consumed/produced by BioMoby service providers are explicitly grounded (semantically and syntactically) in an ontology A second ontology describes the analytical functions that a Web Service can perform
Context:  BioMoby Web Services BioMoby ontologies suffer from being  semantically VERY shallow…  thus it is VERY difficult to discover the Web Service that you REALLY want at any given moment… Can we improve discovery by improving the semantic annotation of the services?
Experiment Implemented The  BioMoby Annotator Web interface for annotation myGrid ontology + Freebase as the grounding Recruited volunteers Volunteers annotated BioMoby Web Services Measured Inter-annotator agreement Agreement with manually constructed standard Individuals, aggregates
BioMoby Annotator Information extracted from  Moby Central Web Service Registry Tagging areas
Tagging Type-ahead tag suggestions drawn from myGrid Web Service Ontology & from Freebase
Tagging New simple tags can also be created, as per normal tagging
“ Gold-Standard” Dataset 27 BioMoby services were hand-annotated by us Typical bioinformatics functions Retrieve database record Perform sequence alignment Identifier-to-Identifier mapping
Volunteers Recruited friends and posted on mailing lists. Offered small reward for completing the experiment ($20 Amazon) 19 participants Mix of BioMoby developers, bioinformaticians, statisticians, students. Majority had some experience with Web Services 13 completed annotating  all  of the selected services
Measurements Inter-annotator agreement Standard approach for estimating annotation quality. Usually measured for small groups of professional annotators (typically 2-4**) Agreement with the “gold standard” Measured in the same way but one “annotator” is considered the standard
Inter-annotator Agreement Metric Positive Specific Agreement Amount of overlap between all annotations elicited for a particular item comparing annotators pairwise 2*I (2*I + a + b) I = intersection of sets A and B a = A without I b = B without I  PSA(A, B) =
Gold-standard Agreement Metrics Precision, Recall, F measure True tags by T All tags by T Precision (T) = True tags by T All true tags Recall (T) = (F = PSA if one set considered “true”) F = harmonic mean of P and R (2PR/P+R)
Metrics Average pairwise agreements reported Across all pairs of annotators By Service Operation (e.g. retrieval) and Objects (e.g. DNA sequence) By semantically-grounded tags By free-text tags
Inter-Annotator Agreement Type N pairs mean median min max stand. dev. coefficient of variation Free, Object 1658 0.09 0.00 0.00 1.00 0.25 2.79 Semantic, Object 3482 0.44 0.40 0.00 1.00 0.43 0.98 Free,  Operation 210 0.13 0.00 0.00 1.00 0.33 2.49 Semantic, Operation 2599 0.54 0.67 0.00 1.00 0.32 0.58
Agreement to “Gold” Standard Subject Type measure mean median min max stand. dev. coefficient of variation Data-types (input & output) PSA 0.52 0.51 0.32 0.71 0.11 0.22 Precision 0.54 0.53 0.33 0.74 0.13 0.24 Recall 0.54 0.54 0.30 0.71 0.12 0.21 Web Service Operations PSA 0.59 0.60 0.36 0.75 0.10 0.18 Precision 0.81 0.79 0.52 1.0 0.13 0.16 Recall 0.53 0.50 0.26 0.77 0.15 0.28
Consensus & Correctness:  Datatypes
Consensus and Correctness:  Operations
Open Annotations are  Different
Trust must be earned Can be decided  at runtime By consensus agreement (as described here) By annotator reputation By recency By your favorite algorithm By you !
IT’S ALL ABOUT CONTEXT!! We can get REALLY good semantic annotations IF we provide context!!
Open Semantic Annotation Works IF we provide CONTEXT IF enough volunteers contribute BUT we do not understand why people do or do not contribute without $$$ incentive SO further research is needed to understand Social Psychology on the Web
Watch for Forthcoming issue in the International Journal of Knowledge Engineering and Data Mining on  “ Incentives for  Semantic Content Creation”
Ack’s Benjamin Good Edward Kawas Paul Lu MSFHR/CIHR Bioinformatics Training Programme @ UBC iCAPTURE Centre @ St. Paul’s Hospital NSERC Genome Canada/Genome Alberta

More Related Content

PDF
Tumor Type Search
PDF
Tutorial 1.3 - Run Enrichment Analysis
PPTX
Sustainability Assembly1
PPTX
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
PPT
HOJA DE VIDA "LEODAN MARTINEZ "
PPTX
Index-Thumb
DOC
What is a pub?
PPTX
Smart brief content marketing trifecta
Tumor Type Search
Tutorial 1.3 - Run Enrichment Analysis
Sustainability Assembly1
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
HOJA DE VIDA "LEODAN MARTINEZ "
Index-Thumb
What is a pub?
Smart brief content marketing trifecta

Viewers also liked (17)

PDF
Tutorial 1.6 - Export heatmap image and table results
PPS
T I P S P A R A D I S F R U T A R T U V I D A Mafalda
PDF
Cs For Mk Brochure[1]
PDF
Bambu Communication Group Credential
PPT
PHUG - Open Source Culture
PDF
Tutorial 1.5 - Edit heatmaps
PPTX
Technologies, methods and challenges to data sharing and aggrigation
PPS
¡UNA BOTELLA AGUA....Y QUE!
PPT
Red5 - PHUG Workshops
KEY
Migrate, Grow, and Cultivate your Community
KEY
The Evolution of Live Preview Environment Design
PDF
Experiment Search
PPTX
How SADI & SHARE help restore the Scientific Method to in silico science
PPS
¡ALIMENTOS Y MALESTARES!
PDF
Eindadvies over-de-vernieuwing-van-de-examenprogrammas-maatschappijwetenschap...
PPTX
SWAT4LS 2011: SADI Knowledge Explorer Plug-in
PPT
Making the Most of Plug-ins - WordCamp Toronto 2008
Tutorial 1.6 - Export heatmap image and table results
T I P S P A R A D I S F R U T A R T U V I D A Mafalda
Cs For Mk Brochure[1]
Bambu Communication Group Credential
PHUG - Open Source Culture
Tutorial 1.5 - Edit heatmaps
Technologies, methods and challenges to data sharing and aggrigation
¡UNA BOTELLA AGUA....Y QUE!
Red5 - PHUG Workshops
Migrate, Grow, and Cultivate your Community
The Evolution of Live Preview Environment Design
Experiment Search
How SADI & SHARE help restore the Scientific Method to in silico science
¡ALIMENTOS Y MALESTARES!
Eindadvies over-de-vernieuwing-van-de-examenprogrammas-maatschappijwetenschap...
SWAT4LS 2011: SADI Knowledge Explorer Plug-in
Making the Most of Plug-ins - WordCamp Toronto 2008
Ad

Similar to The BioMoby Semantic Annotation Experiment (20)

PDF
A Framework For Resource Annotation And Classification In Bioinformatics
PPT
Adding Meaning To Your Data
PPT
Semantic Web research anno 2006:main streams, popular falacies, current statu...
PPT
Finding knowledge, data and answers on the Semantic Web
PPTX
Big data ontology_summit_feb2012
PPTX
The Semantic Web - This time... its Personal
PPT
Vivo Search
PDF
The Revolution Of Cloud Computing
PDF
Applying semantic web services
PPTX
Semantic Search at Yahoo
PPTX
Research - this time it's personal
PPTX
C:\fakepath\bioit world2010
PPTX
SADI SWSIP '09 'cause you can't always GET what you want!
PPTX
Building a Semantic search Engine in a library
PDF
Semantic Web concepts used in Web 3.0 applications
PPTX
Semantic mark-up with schema.org: helping search engines understand the Web
PPTX
Semantic annotation of biomedical data
PDF
Semantic Web from the 2013 Perspective
PDF
PPTX
Semantic Search keynote at CORIA 2015
A Framework For Resource Annotation And Classification In Bioinformatics
Adding Meaning To Your Data
Semantic Web research anno 2006:main streams, popular falacies, current statu...
Finding knowledge, data and answers on the Semantic Web
Big data ontology_summit_feb2012
The Semantic Web - This time... its Personal
Vivo Search
The Revolution Of Cloud Computing
Applying semantic web services
Semantic Search at Yahoo
Research - this time it's personal
C:\fakepath\bioit world2010
SADI SWSIP '09 'cause you can't always GET what you want!
Building a Semantic search Engine in a library
Semantic Web concepts used in Web 3.0 applications
Semantic mark-up with schema.org: helping search engines understand the Web
Semantic annotation of biomedical data
Semantic Web from the 2013 Perspective
Semantic Search keynote at CORIA 2015
Ad

More from Mark Wilkinson (20)

PPTX
FAIR Metrics - Presentation to NIH KC1
PDF
Introducing the fair evaluator
PPTX
FAIR Projector Builder
PPTX
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
PPTX
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
PPTX
IBC FAIR Data Prototype Implementation slideshow
PDF
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
PDF
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
DOCX
Sample data and other ur ls
DOCX
Example code for the SADI BMI Calculator Web Service
DOCX
Sadi service
PPTX
Tutorial - Creating SADI semantic-web-services
PPTX
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
PPTX
Force11 JDDCP workshop presentation, @ Force2015, Oxford
PPTX
Presentation to the J. Craig Venter Institute, Dec. 2014
PPTX
SADI CSHALS 2013
PPTX
Web Science 2.0 - in silico science
PPTX
Web Science - ISoLA 2012
PPTX
Web Science, SADI, and the Singularity
PPTX
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
FAIR Metrics - Presentation to NIH KC1
Introducing the fair evaluator
FAIR Projector Builder
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
IBC FAIR Data Prototype Implementation slideshow
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Sample data and other ur ls
Example code for the SADI BMI Calculator Web Service
Sadi service
Tutorial - Creating SADI semantic-web-services
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Presentation to the J. Craig Venter Institute, Dec. 2014
SADI CSHALS 2013
Web Science 2.0 - in silico science
Web Science - ISoLA 2012
Web Science, SADI, and the Singularity
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Machine learning based COVID-19 study performance prediction
PPT
Teaching material agriculture food technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Big Data Technologies - Introduction.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Cloud computing and distributed systems.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
KodekX | Application Modernization Development
Digital-Transformation-Roadmap-for-Companies.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Machine learning based COVID-19 study performance prediction
Teaching material agriculture food technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Big Data Technologies - Introduction.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MYSQL Presentation for SQL database connectivity
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Understanding_Digital_Forensics_Presentation.pptx
Programs and apps: productivity, graphics, security and other tools
Network Security Unit 5.pdf for BCA BBA.
Cloud computing and distributed systems.
20250228 LYD VKU AI Blended-Learning.pptx
Approach and Philosophy of On baking technology
Chapter 3 Spatial Domain Image Processing.pdf
Spectroscopy.pptx food analysis technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

The BioMoby Semantic Annotation Experiment

  • 1. Open Semantic Annotation an experiment with BioMoby Web Services Benjamin Good, Paul Lu, Edward Kawas, Mark Wilkinson University of British Columbia Heart + Lung Research Institute St. Paul’s Hospital
  • 2. The Web contains lots of things
  • 3. But the Web doesn’t know what they ARE text/html video/mpeg image/jpg audio/aiff
  • 4. The Semantic Web It’s A Duck
  • 5. Semantic Web Reasoning Logically… It’s A Duck Defining the world by its properties helps me find the KINDS of things I am looking for Add properties to the things we are describing Walks Like a Duck Quacks Like a Duck Looks Like a Duck
  • 6. Asserted vs. Reasoned Semantic Web Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “ narrower term” relation Formal is-a Frames (Properties) Informal is-a Formal instance Value Restrs. General Logical constraints Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness. Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
  • 7. Who assigns these properties? Works ~well … but doesn’t scale
  • 8. When we say “Web” we mean “Scale”
  • 9. Natural Language Processing Scales Well… Works!! … Sometimes… … Sort of….
  • 10. Natural Language Processing Problem #1 Requires text to get the process started Problem #2 Low accuracy means it can only support, not replace, manual annotation
  • 11. Web 2.0 Approach OPEN to all Web users (Scale!) Parallel, Distributed, “ Human Computation”
  • 12. Human Computation Getting lots of people to solve problems that are difficult for computers. (term introduced by Luis Von Ahn, Carnegie Mellon University)
  • 14. ESP Game results >4 million images labeled >23,000 players Given 5,000 players online simultaneously, could label all of the images accessible to Google in a month See the “Google image labeling game”… Luis Von Ahn and Laura Dabbish (2004) “Labeling images with a computer game” ACM Conference on Human Factors in Computing Systems (CHI)
  • 15. Social Tagging Accepted Widely applied Passive volunteer annotation. Del.icio.us 2006 surpassed 1 million users Connotea, CiteUlike, etc. See also our ED2Connotea extension This is a picture of Japanese traditional wagashi sweets called “seioubo” which is modeled after a peach
  • 16. BUSTED! I just pulled a bunch of Semantics out of my Seioubo!
  • 17. BUSTED! This is a picture of Japanese traditional wagashi sweets called “seioubo” which is modeled after a peach This is a totally sweet picture of peaches grown in the city of Seioubo, in the Wagashi region of Japan
  • 18. So tagging isn’t enough… We need properties, but the properties need to be semantically-grounded in order to enable reasoning (and this ain’t gonna happen through NLP because there is even less context in tags!)
  • 19. Social Semantic Tagging Q1: Can we design interfaces that assist “the masses” to derive their tags from controlled vocabularies (ontologies)? Q2: How well do “the masses” do when faced with such an interface? Can this data be used “rigorously” for e.g. logical reasoning? Q3: “The masses” seem to be good at tagging things like pictures… no brainer! How do they do at tagging more complex things like bioinformatics Web Services?
  • 20. Context: BioMoby Web Services BioMoby is a Semantic Web Services framework in which the data-objects consumed/produced by BioMoby service providers are explicitly grounded (semantically and syntactically) in an ontology A second ontology describes the analytical functions that a Web Service can perform
  • 21. Context: BioMoby Web Services BioMoby ontologies suffer from being semantically VERY shallow… thus it is VERY difficult to discover the Web Service that you REALLY want at any given moment… Can we improve discovery by improving the semantic annotation of the services?
  • 22. Experiment Implemented The BioMoby Annotator Web interface for annotation myGrid ontology + Freebase as the grounding Recruited volunteers Volunteers annotated BioMoby Web Services Measured Inter-annotator agreement Agreement with manually constructed standard Individuals, aggregates
  • 23. BioMoby Annotator Information extracted from Moby Central Web Service Registry Tagging areas
  • 24. Tagging Type-ahead tag suggestions drawn from myGrid Web Service Ontology & from Freebase
  • 25. Tagging New simple tags can also be created, as per normal tagging
  • 26. “ Gold-Standard” Dataset 27 BioMoby services were hand-annotated by us Typical bioinformatics functions Retrieve database record Perform sequence alignment Identifier-to-Identifier mapping
  • 27. Volunteers Recruited friends and posted on mailing lists. Offered small reward for completing the experiment ($20 Amazon) 19 participants Mix of BioMoby developers, bioinformaticians, statisticians, students. Majority had some experience with Web Services 13 completed annotating all of the selected services
  • 28. Measurements Inter-annotator agreement Standard approach for estimating annotation quality. Usually measured for small groups of professional annotators (typically 2-4**) Agreement with the “gold standard” Measured in the same way but one “annotator” is considered the standard
  • 29. Inter-annotator Agreement Metric Positive Specific Agreement Amount of overlap between all annotations elicited for a particular item comparing annotators pairwise 2*I (2*I + a + b) I = intersection of sets A and B a = A without I b = B without I PSA(A, B) =
  • 30. Gold-standard Agreement Metrics Precision, Recall, F measure True tags by T All tags by T Precision (T) = True tags by T All true tags Recall (T) = (F = PSA if one set considered “true”) F = harmonic mean of P and R (2PR/P+R)
  • 31. Metrics Average pairwise agreements reported Across all pairs of annotators By Service Operation (e.g. retrieval) and Objects (e.g. DNA sequence) By semantically-grounded tags By free-text tags
  • 32. Inter-Annotator Agreement Type N pairs mean median min max stand. dev. coefficient of variation Free, Object 1658 0.09 0.00 0.00 1.00 0.25 2.79 Semantic, Object 3482 0.44 0.40 0.00 1.00 0.43 0.98 Free, Operation 210 0.13 0.00 0.00 1.00 0.33 2.49 Semantic, Operation 2599 0.54 0.67 0.00 1.00 0.32 0.58
  • 33. Agreement to “Gold” Standard Subject Type measure mean median min max stand. dev. coefficient of variation Data-types (input & output) PSA 0.52 0.51 0.32 0.71 0.11 0.22 Precision 0.54 0.53 0.33 0.74 0.13 0.24 Recall 0.54 0.54 0.30 0.71 0.12 0.21 Web Service Operations PSA 0.59 0.60 0.36 0.75 0.10 0.18 Precision 0.81 0.79 0.52 1.0 0.13 0.16 Recall 0.53 0.50 0.26 0.77 0.15 0.28
  • 36. Open Annotations are Different
  • 37. Trust must be earned Can be decided at runtime By consensus agreement (as described here) By annotator reputation By recency By your favorite algorithm By you !
  • 38. IT’S ALL ABOUT CONTEXT!! We can get REALLY good semantic annotations IF we provide context!!
  • 39. Open Semantic Annotation Works IF we provide CONTEXT IF enough volunteers contribute BUT we do not understand why people do or do not contribute without $$$ incentive SO further research is needed to understand Social Psychology on the Web
  • 40. Watch for Forthcoming issue in the International Journal of Knowledge Engineering and Data Mining on “ Incentives for Semantic Content Creation”
  • 41. Ack’s Benjamin Good Edward Kawas Paul Lu MSFHR/CIHR Bioinformatics Training Programme @ UBC iCAPTURE Centre @ St. Paul’s Hospital NSERC Genome Canada/Genome Alberta