SlideShare a Scribd company logo
Multilingual Semantic Annotation
Engine for Agricultural
Documents
Benjamin Chu Min Xian
Arun Anand Sadanandan
Fadzly Zahari
Dickson Lukose
                                                 04.09.2012
                    International Symposium on Agricultural
                                Ontology Service (AOS2012)
Outline
   Introduction
   Related Work
   System Description: Text Annotation Engine
   Challenges
   Conclusion




                                                 2
Introduction




               3
Related Work
• Semantic Annotation techniques are
  typically categorized into pattern-based
  and machine learning-based
• Most of the annotation tools can only deal
  with a single language
• Not easily customized to work for different
  domains



                                                4
Text Annotation Engine (T-ANNE1)
• Semantic tagging system
    – Semantic web of tags
• Knowledge base approach
• Scalable system
    – Handles large sets of documents
    – Web services
• Distributed approach
    – Document Splitter
• Multilingual tagging
    – Language identifier
 1. Chu, M.X., Bahls, D., Lukose, D.: A System and Method for Concept and Named Entity Recognition
 (2012). (Patent Pending)                                                                            5
Text Annotation Engine (T-ANNE)
Multilingual Semantic Annotation System Overview
Text Annotation Engine (T-ANNE)

                 Semantic
                Annotation
                                       AGROVOC
                  Engine
                 (T-ANNE)
  Documents                          Knowledge Base




              Semantic Annotations




                      TAGS


                 Knowledge Base
Text Annotation Engine (T-ANNE)
Example (Japanese)

                             Semantic
                            Annotation
                              Engine       AGROVOC
                             (T-ANNE)
                                         Knowledge Base




                          TAGS


                     Knowledge Base
Text Annotation Engine (T-ANNE)
• Knowledge-based approach
  • The number of languages and domains it can
    handle is only limited by the knowledge base
    it uses
  • Easily customized
  • Utilizes AGROVOC as the knowledge base
    for recognition and annotation of agriculture
    related documents



                                                    9
Text Annotation Engine (T-ANNE)
• Multilingual capability
  • Automatically determines the language of the text
  • AGROVOC – multilingual thesaurus more than
    40,000 concepts in up to 22 languages




                                                        10
Challenges
1. Ambiguity
2. Morphological Variations
3. Detail / Granularity Level




                                11
Challenges
1. Ambiguity

                                A song or the Himalayan region?


 “They performed Kashmir, written by Page and Plant. Page played unusual chords on
 his Gibson”.


     Guitar brand or actor “Mel Gibson”?



                  Guitarist “Jimmy Page” or the Google founder “Larry Page”?




                                                                                     12
Challenges
2. Morphological Variations

Variation of entities representing the same concept using:
    Plurals
    Acronyms / Abbreviations
    Different Spellings
    Compound Words
    Language




                                                             13
Challenges
3. Detail / Granularity Level

 Some annotation system will issue more generic tags while
  others issue more specific tags.


 For example, a general tag as ‘Cereals’ in contrast to a specific
  tag as ‘Waxy maize’.

 It really depends what would be the actual need of the results,
  whether the system should return coarse-grained or fine-grained
  annotation tags. It is important to choose the right granularity (detail)
  level.


                                                                              14
Conclusions
 Annotation engine uses knowledge based approach
  that performs concept entity recognition

 Application domains and the number of languages it can
  handle relies on the knowledge base used for the
  recognition purpose.

 Future work - Address the challenges (Entity Resolution,
  Disambiguation)




                                                             15
16

More Related Content

PPT
Speech recognition system
PPT
Voice Recognition
PPTX
Sensors, Wearables and Internet of Things - The Dawn of the Smart Era
PPT
Semantic technologies for the Internet of Things
PPTX
Annotation seminar
PDF
Zemanta Tech Talk at Audible
PPTX
Lecture semantic augmentation
PPTX
UAB 2011- Combining human and computational intelligence
Speech recognition system
Voice Recognition
Sensors, Wearables and Internet of Things - The Dawn of the Smart Era
Semantic technologies for the Internet of Things
Annotation seminar
Zemanta Tech Talk at Audible
Lecture semantic augmentation
UAB 2011- Combining human and computational intelligence

Similar to Multilingual Semantic Annotation Engine for Agricultural Documents (20)

KEY
Solstrand
PDF
Automatic Annotation Approach Of Events In News Articles
PDF
Developing an architecture for translation engine using ontology
PDF
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
PDF
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
PDF
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
PDF
Semantic Based Model for Text Document Clustering with Idioms
PPTX
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
PPTX
Pptphrase tagset mapping for french and english treebanks and its application...
KEY
What's Next for the Web?
PDF
Pptphrase tagset mapping for french and english treebanks and its application...
PDF
An integrated approach to discover tag semantics
PDF
IRJET- A Novel Approch Automatically Categorizing Software Technologies
PPTX
Use of ontologies in natural language processing
PDF
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
PDF
Entity Annotation WordPress Plugin using TAGME Technology
PDF
2010 10-building-global-listening-platform-with-solr
PDF
Identification of Entities in Swedish
PDF
From Linked Data to Semantic Applications
PDF
Visualising a text with a tree cloud
Solstrand
Automatic Annotation Approach Of Events In News Articles
Developing an architecture for translation engine using ontology
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Semantic Based Model for Text Document Clustering with Idioms
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
Pptphrase tagset mapping for french and english treebanks and its application...
What's Next for the Web?
Pptphrase tagset mapping for french and english treebanks and its application...
An integrated approach to discover tag semantics
IRJET- A Novel Approch Automatically Categorizing Software Technologies
Use of ontologies in natural language processing
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
Entity Annotation WordPress Plugin using TAGME Technology
2010 10-building-global-listening-platform-with-solr
Identification of Entities in Swedish
From Linked Data to Semantic Applications
Visualising a text with a tree cloud
Ad

More from AIMS (Agricultural Information Management Standards) (20)

PPT
Linked Data Competency Index : Mapping the field for teachers and learners
PDF
Metadata as Standard: improving Interoperability through the Research Data Al...
PPTX
Assigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
PDF
VocBench 3: some insights on the forthcoming release
PPT
The case for Digital Objects Identifiers (DOIs) in support of research activi...
PPT
Webinar@AIMS_FAIR Principles and Data Management Planning
PDF
Webinar@ASIRA: How to foster openness from an academic library
PDF
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
PDF
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
PDF
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
PDF
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
PDF
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
PDF
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
PDF
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
PPTX
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
PDF
Using AGRIS as a portal of choice to access agricultural research and technol...
PPTX
Research4Life: La bibliothèque qui ouvre ses portes
PDF
Publishing skos concept schemes with skosmos
PPTX
Research4Life: La biblioteca que abre puertas
PPTX
Research4Life: The library that opens doors
Linked Data Competency Index : Mapping the field for teachers and learners
Metadata as Standard: improving Interoperability through the Research Data Al...
Assigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
VocBench 3: some insights on the forthcoming release
The case for Digital Objects Identifiers (DOIs) in support of research activi...
Webinar@AIMS_FAIR Principles and Data Management Planning
Webinar@ASIRA: How to foster openness from an academic library
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Using AGRIS as a portal of choice to access agricultural research and technol...
Research4Life: La bibliothèque qui ouvre ses portes
Publishing skos concept schemes with skosmos
Research4Life: La biblioteca que abre puertas
Research4Life: The library that opens doors
Ad

Multilingual Semantic Annotation Engine for Agricultural Documents

  • 1. Multilingual Semantic Annotation Engine for Agricultural Documents Benjamin Chu Min Xian Arun Anand Sadanandan Fadzly Zahari Dickson Lukose 04.09.2012 International Symposium on Agricultural Ontology Service (AOS2012)
  • 2. Outline  Introduction  Related Work  System Description: Text Annotation Engine  Challenges  Conclusion 2
  • 4. Related Work • Semantic Annotation techniques are typically categorized into pattern-based and machine learning-based • Most of the annotation tools can only deal with a single language • Not easily customized to work for different domains 4
  • 5. Text Annotation Engine (T-ANNE1) • Semantic tagging system – Semantic web of tags • Knowledge base approach • Scalable system – Handles large sets of documents – Web services • Distributed approach – Document Splitter • Multilingual tagging – Language identifier 1. Chu, M.X., Bahls, D., Lukose, D.: A System and Method for Concept and Named Entity Recognition (2012). (Patent Pending) 5
  • 6. Text Annotation Engine (T-ANNE) Multilingual Semantic Annotation System Overview
  • 7. Text Annotation Engine (T-ANNE) Semantic Annotation AGROVOC Engine (T-ANNE) Documents Knowledge Base Semantic Annotations TAGS Knowledge Base
  • 8. Text Annotation Engine (T-ANNE) Example (Japanese) Semantic Annotation Engine AGROVOC (T-ANNE) Knowledge Base TAGS Knowledge Base
  • 9. Text Annotation Engine (T-ANNE) • Knowledge-based approach • The number of languages and domains it can handle is only limited by the knowledge base it uses • Easily customized • Utilizes AGROVOC as the knowledge base for recognition and annotation of agriculture related documents 9
  • 10. Text Annotation Engine (T-ANNE) • Multilingual capability • Automatically determines the language of the text • AGROVOC – multilingual thesaurus more than 40,000 concepts in up to 22 languages 10
  • 11. Challenges 1. Ambiguity 2. Morphological Variations 3. Detail / Granularity Level 11
  • 12. Challenges 1. Ambiguity A song or the Himalayan region? “They performed Kashmir, written by Page and Plant. Page played unusual chords on his Gibson”. Guitar brand or actor “Mel Gibson”? Guitarist “Jimmy Page” or the Google founder “Larry Page”? 12
  • 13. Challenges 2. Morphological Variations Variation of entities representing the same concept using:  Plurals  Acronyms / Abbreviations  Different Spellings  Compound Words  Language 13
  • 14. Challenges 3. Detail / Granularity Level  Some annotation system will issue more generic tags while others issue more specific tags.  For example, a general tag as ‘Cereals’ in contrast to a specific tag as ‘Waxy maize’.  It really depends what would be the actual need of the results, whether the system should return coarse-grained or fine-grained annotation tags. It is important to choose the right granularity (detail) level. 14
  • 15. Conclusions  Annotation engine uses knowledge based approach that performs concept entity recognition  Application domains and the number of languages it can handle relies on the knowledge base used for the recognition purpose.  Future work - Address the challenges (Entity Resolution, Disambiguation) 15
  • 16. 16