SlideShare a Scribd company logo
Enabling Multilingual Search
through Controlled Vocabularies:
the AGRIS Approach
Fabrizio Celli, Johannes Keizer
MTSR 2016
AGRIS
• Bibliographic database of 8 million
multilingual publications in the food and
agricultural domain
• 350,000 visits/month from
more than 200 countries and
territories (Google Analytics)
• Need to support cross-language
information retrieval
Enabling Multilingual Search through
Controlled Vocabularies: the AGRIS
Approach
2
Cross-language information retrieval
• When a user queries AGRIS, results refer to the
language of the query and of AGRIS metadata
– the user query 稻米 returns all bibliographic
references containing the word 稻米 in title, abstract,
or as a keyword
• But the user may be interested in results in all
languages or in a subset of them!
• Multilingual controlled vocabulary is a valid tool
to deal with this scenario
Enabling Multilingual Search through
Controlled Vocabularies: the AGRIS
Approach
3
Query filters are essential to reduce the number
of results after multilingual query expansion
Enabling Multilingual Search through
Controlled Vocabularies: the AGRIS
Approach
4
Multilingual query expansion module
• Given a user query, the system:
– Uses AGROVOC to translate keywords
– Expands the query, boosting keywords provided by
the user
– Returns results in all available languages
• The process relies on an intermediate Solr index
– It contains AGRIVOC RDF
– For each concept identified by a URI, the index stores
preferred and alternative labels in all languages
Enabling Multilingual Search through
Controlled Vocabularies: the AGRIS
Approach
5
Multilingual Query
Expansion Module
AGRIS website
1. Query AGRIS
Query pattern
analyzer
Query expander
AGROVOC
label index
2. Expand the
query
2.1
2.2
3. Return the
expanded query
Q1
AGRIS core
index
4. Use Q1 to
query AGRIS
index
稻米
"稻米"^50 OR (“Rice" OR "चावल" OR "Reis" OR
"рис" OR "ເຂົ້ າ" OR "벼" OR "Arroz" OR "Riso"
OR "Riz" OR "rizs" OR "rýže" OR "‫"أرز‬ OR "ข้าว"
OR "米" OR "ryža" OR "‫"برنج‬ OR "pirinç")
6
Analysis of results
• Correctness of results depends on the
correctness of the AGROVOC thesaurus and
AGRIS metadata
Source query
English
translation
Number of
results
Number of
results of
multilingual search
稻米 rice 14 166,639
फसलें crops 0 474,854
latte milk 8,019 189,475
Klimaänderung climate change 23 31,028
"su muhafazası" water conservation 22 15,285
‫إنتظام‬‫حراري‬‫للتربة‬ soil thermal regimes 21 368
"forest mensuration" forest mensuration 3,679 3,930
7
Performance and Usage
• The execution of multilingual search requires
68.75 milliseconds more than the default
search
• 2% of AGRIS active users enable the
multilingual search
– 350,000 users/month
– 80% come from Google.com and Google Scholar
– 20% represent “active” users
– 1,400 users/month use multilingual search
Enabling Multilingual Search through
Controlled Vocabularies: the AGRIS
Approach
8
Synonyms Query Expansion Module
• The union of preferred and alternative labels
compose the set of synonyms for that
language available in AGROVOC
– Groundnuts: 2,824 results
– Peanuts: 6,750 results
• If the user searches for “Peanuts” and enables
the synonyms expansion module:
– 9,222 results (352 records contain both “Peanuts”
and “Groundnuts” )
Enabling Multilingual Search through
Controlled Vocabularies: the AGRIS
Approach
9
Conclusions
• AGRIS relies on a controlled vocabulary to
implement multilingual search and synonyms
expansion
• Experimental results demonstrate significant
improvements of recall in both cases
• Future work:
– Generalizing or restricting the topic of a query by
navigating the hierarchy of AGROVOC concepts
– Automatically performs different query expansions
and combinations of them, presenting to end users
alternative subsets of results
Enabling Multilingual Search through
Controlled Vocabularies: the AGRIS
Approach
10

More Related Content

DOC
Tony Kleese - Information resources for sustainable and organic farming, CFSA...
PPTX
IBM Research Increasing the Speed & Scale of Impact
PDF
Ones own work
PDF
Profil Pasangan Calon : Djangkung Sudjarwadi - Endah Subekti Kuntariningsih
PPTX
Founder Leadership Workshop
DOC
Otero barnes secondary - lesson 1 - 5.9 (2)
Tony Kleese - Information resources for sustainable and organic farming, CFSA...
IBM Research Increasing the Speed & Scale of Impact
Ones own work
Profil Pasangan Calon : Djangkung Sudjarwadi - Endah Subekti Kuntariningsih
Founder Leadership Workshop
Otero barnes secondary - lesson 1 - 5.9 (2)

Viewers also liked (6)

DOC
السيرة الذاتية
PDF
SAP SRM
PDF
600263122_certificate
PDF
Profil Pasangan Calon : Subardi TS - Wahyu Purwanto
PDF
CV Tom Bogen
PDF
1604 06 ss-i-mobile概要資料
السيرة الذاتية
SAP SRM
600263122_certificate
Profil Pasangan Calon : Subardi TS - Wahyu Purwanto
CV Tom Bogen
1604 06 ss-i-mobile概要資料
Ad

Similar to Enabling multilingual search (20)

PPT
Afita Mssa Version 2
PDF
AGRIS: an RDF-aware system in the agricultural domain
PDF
Using AGRIS as a portal of choice to access agricultural research and technol...
PPTX
Lo c 2011-05-18
PPTX
Vocabularies and Linked Open Data
PPT
2005 09 Dc Keynote
PPTX
2 AGRIS.pptx
PPTX
Exploiting Multilinguality For Creating Mappings Between Thesauri
PPTX
Automatic Indexing of Bibliographic Metadata: The AgroTagger Usecase
PPTX
2015 11 agris-medes
PPT
FAO in OPAALS: Semantic Search Assistant
PDF
Interlinking two institutional KOS about Agroecology: using LOD Agrovoc to ci...
PPTX
Agris (agricultural information system)
PPTX
AGRIS (agricultural information system)
PDF
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
PPT
PPTX
Use and integration of controlled vocabularies (AGROVOC) in DSpace Repositories
PPTX
Agrovoc Linked Open Data and the Voc Bench, Potentials for the Community
PPTX
Nal 2011 05-19
PPTX
International System for Agricultural Science and Technology (AGRIS) by Gaura...
Afita Mssa Version 2
AGRIS: an RDF-aware system in the agricultural domain
Using AGRIS as a portal of choice to access agricultural research and technol...
Lo c 2011-05-18
Vocabularies and Linked Open Data
2005 09 Dc Keynote
2 AGRIS.pptx
Exploiting Multilinguality For Creating Mappings Between Thesauri
Automatic Indexing of Bibliographic Metadata: The AgroTagger Usecase
2015 11 agris-medes
FAO in OPAALS: Semantic Search Assistant
Interlinking two institutional KOS about Agroecology: using LOD Agrovoc to ci...
Agris (agricultural information system)
AGRIS (agricultural information system)
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Use and integration of controlled vocabularies (AGROVOC) in DSpace Repositories
Agrovoc Linked Open Data and the Voc Bench, Potentials for the Community
Nal 2011 05-19
International System for Agricultural Science and Technology (AGRIS) by Gaura...
Ad

More from Johannes Keizer (20)

PPTX
Presentation CABI Beijing 2019 11-04
PPTX
eROSA presentation at CAAS, September 2018
PPTX
2018 03 apan
PPTX
2017 11-15 macs
PPTX
2016 10 caas-ats
PPTX
2016 08 gxaas
PPTX
2016 06 chengdu
PPTX
2017 08 apan
PPTX
2017 09 caas
PPTX
2017 11 wageningen-keizer
PPTX
2017 11 eosc-keizer
PPTX
2017 11 cascd
PPTX
2017 04 igad-jk
PPTX
2017 02 apan
PPTX
2017 06 itpgrfa
PPTX
2017 03 brussels
PPTX
2017 076 efita-sponsor-godan
PPTX
2017 07 montpellier-keizer
PPTX
2017 04 embl
PPTX
The FAIR principle in the Big Data World
Presentation CABI Beijing 2019 11-04
eROSA presentation at CAAS, September 2018
2018 03 apan
2017 11-15 macs
2016 10 caas-ats
2016 08 gxaas
2016 06 chengdu
2017 08 apan
2017 09 caas
2017 11 wageningen-keizer
2017 11 eosc-keizer
2017 11 cascd
2017 04 igad-jk
2017 02 apan
2017 06 itpgrfa
2017 03 brussels
2017 076 efita-sponsor-godan
2017 07 montpellier-keizer
2017 04 embl
The FAIR principle in the Big Data World

Recently uploaded (20)

PPTX
Internet___Basics___Styled_ presentation
PPTX
Digital Literacy And Online Safety on internet
PDF
Exploring VPS Hosting Trends for SMBs in 2025
PDF
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
PPTX
E -tech empowerment technologies PowerPoint
PPT
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
PPTX
Power Point - Lesson 3_2.pptx grad school presentation
PPTX
newyork.pptxirantrafgshenepalchinachinane
PDF
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
PPTX
Mathew Digital SEO Checklist Guidlines 2025
DOCX
Unit-3 cyber security network security of internet system
PDF
mera desh ae watn.(a source of motivation and patriotism to the youth of the ...
PPTX
artificialintelligenceai1-copy-210604123353.pptx
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PPT
Ethics in Information System - Management Information System
PPTX
SAP Ariba Sourcing PPT for learning material
PPTX
artificial intelligence overview of it and more
PPTX
t_and_OpenAI_Combined_two_pressentations
Internet___Basics___Styled_ presentation
Digital Literacy And Online Safety on internet
Exploring VPS Hosting Trends for SMBs in 2025
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
E -tech empowerment technologies PowerPoint
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
Power Point - Lesson 3_2.pptx grad school presentation
newyork.pptxirantrafgshenepalchinachinane
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
Mathew Digital SEO Checklist Guidlines 2025
Unit-3 cyber security network security of internet system
mera desh ae watn.(a source of motivation and patriotism to the youth of the ...
artificialintelligenceai1-copy-210604123353.pptx
INTERNET------BASICS-------UPDATED PPT PRESENTATION
The New Creative Director: How AI Tools for Social Media Content Creation Are...
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
Ethics in Information System - Management Information System
SAP Ariba Sourcing PPT for learning material
artificial intelligence overview of it and more
t_and_OpenAI_Combined_two_pressentations

Enabling multilingual search

  • 1. Enabling Multilingual Search through Controlled Vocabularies: the AGRIS Approach Fabrizio Celli, Johannes Keizer MTSR 2016
  • 2. AGRIS • Bibliographic database of 8 million multilingual publications in the food and agricultural domain • 350,000 visits/month from more than 200 countries and territories (Google Analytics) • Need to support cross-language information retrieval Enabling Multilingual Search through Controlled Vocabularies: the AGRIS Approach 2
  • 3. Cross-language information retrieval • When a user queries AGRIS, results refer to the language of the query and of AGRIS metadata – the user query 稻米 returns all bibliographic references containing the word 稻米 in title, abstract, or as a keyword • But the user may be interested in results in all languages or in a subset of them! • Multilingual controlled vocabulary is a valid tool to deal with this scenario Enabling Multilingual Search through Controlled Vocabularies: the AGRIS Approach 3
  • 4. Query filters are essential to reduce the number of results after multilingual query expansion Enabling Multilingual Search through Controlled Vocabularies: the AGRIS Approach 4
  • 5. Multilingual query expansion module • Given a user query, the system: – Uses AGROVOC to translate keywords – Expands the query, boosting keywords provided by the user – Returns results in all available languages • The process relies on an intermediate Solr index – It contains AGRIVOC RDF – For each concept identified by a URI, the index stores preferred and alternative labels in all languages Enabling Multilingual Search through Controlled Vocabularies: the AGRIS Approach 5
  • 6. Multilingual Query Expansion Module AGRIS website 1. Query AGRIS Query pattern analyzer Query expander AGROVOC label index 2. Expand the query 2.1 2.2 3. Return the expanded query Q1 AGRIS core index 4. Use Q1 to query AGRIS index 稻米 "稻米"^50 OR (“Rice" OR "चावल" OR "Reis" OR "рис" OR "ເຂົ້ າ" OR "벼" OR "Arroz" OR "Riso" OR "Riz" OR "rizs" OR "rýže" OR "‫"أرز‬ OR "ข้าว" OR "米" OR "ryža" OR "‫"برنج‬ OR "pirinç") 6
  • 7. Analysis of results • Correctness of results depends on the correctness of the AGROVOC thesaurus and AGRIS metadata Source query English translation Number of results Number of results of multilingual search 稻米 rice 14 166,639 फसलें crops 0 474,854 latte milk 8,019 189,475 Klimaänderung climate change 23 31,028 "su muhafazası" water conservation 22 15,285 ‫إنتظام‬‫حراري‬‫للتربة‬ soil thermal regimes 21 368 "forest mensuration" forest mensuration 3,679 3,930 7
  • 8. Performance and Usage • The execution of multilingual search requires 68.75 milliseconds more than the default search • 2% of AGRIS active users enable the multilingual search – 350,000 users/month – 80% come from Google.com and Google Scholar – 20% represent “active” users – 1,400 users/month use multilingual search Enabling Multilingual Search through Controlled Vocabularies: the AGRIS Approach 8
  • 9. Synonyms Query Expansion Module • The union of preferred and alternative labels compose the set of synonyms for that language available in AGROVOC – Groundnuts: 2,824 results – Peanuts: 6,750 results • If the user searches for “Peanuts” and enables the synonyms expansion module: – 9,222 results (352 records contain both “Peanuts” and “Groundnuts” ) Enabling Multilingual Search through Controlled Vocabularies: the AGRIS Approach 9
  • 10. Conclusions • AGRIS relies on a controlled vocabulary to implement multilingual search and synonyms expansion • Experimental results demonstrate significant improvements of recall in both cases • Future work: – Generalizing or restricting the topic of a query by navigating the hierarchy of AGROVOC concepts – Automatically performs different query expansions and combinations of them, presenting to end users alternative subsets of results Enabling Multilingual Search through Controlled Vocabularies: the AGRIS Approach 10

Editor's Notes

  • #7: the multilingual query expansion module queries the AGROVOC label index to obtain translations of source keywords. The module matches source keywords against both preferred and alternative labels to identify the AGROVOC concept, but it considers only preferred labels as output of the translation process. In fact, alternative labels can mediate query expansion with synonyms. After that, the module expands the source query by building a union of source keywords and their translations. The system boosts source keywords by a factor of 50, since we think that it is important to return to users results of their original query first, and then results of the multilingual query.
  • #8: Correctness of results depends on the correctness of the AGROVOC thesaurus and AGRIS metadata. A community of domain experts from different countries contributes to the quality and correctness of labels available in AGROVOC. Thus, the multilingual translation based on AGROVOC is highly reliable as far as the agricultural domain concerns.
  • #9: This number is quite satisfying, since multilingual search is an advanced functionality and we expect a small percentage of usage. In addition to that, the multilingual search is a new AGRIS functionality and it needs time to reach the public. It is highly likely that the percentage will increase over the time and after we will promote the multilingual search in webinars and events.