SlideShare a Scribd company logo
Porting the QALL-ME framework to Romanian

                    Constantin Or˘san
                                 a

           Research Group in Computational Linguistics
    Research Institute in Information and Language Processing
                   University of Wolverhampton
                http://www.wlv.ac.uk/~in6093/


                      29th March 2010
1 Introduction



2 The QALL-ME project



3 Multilingual information access in QALL-ME



4 Conclusions
Structure of the presentation



1 Introduction


2 The QALL-ME project


3 Multilingual information access in QALL-ME


4 Conclusions
Need to access information




• as a result of the Internet development more and more
  information becomes available
• this information is in many languages
• fields from computational linguistics such as automatic
  summarisation, question answering, text mining, etc. can help
  people deal with information
Need to access information




• as a result of the Internet development more and more
  information becomes available
• this information is in many languages
• fields from computational linguistics such as automatic
  summarisation, question answering, text mining, etc. can help
  people deal with information
Question answering (QA)



• Question answering aims at identifying the answer to a
  question in a large collection of documents
• the information provided by QA is more focused than
  information retrieval
• the output can be the exact answer or a text snippet which
  contains the answer
• the domain took off as a result of the introduction of QA
  track in TREC, whilst cross-lingual QA as a result of CLEF
Types of QA systems

• open-domain QA systems: can answer any question from any
  collection
  + can potentially answer any question
  - very low accuracy (especially in cross-lingual settings)
Types of QA systems

• open-domain QA systems: can answer any question from any
  collection
  + can potentially answer any question
  - very low accuracy (especially in cross-lingual settings)
• canned QA systems: rely on a very large repository of
  questions for which the answer is known
  + very little processing necessary
  - limited to the answers in the database
Types of QA systems

• open-domain QA systems: can answer any question from any
  collection
  + can potentially answer any question
  - very low accuracy (especially in cross-lingual settings)
• canned QA systems: rely on a very large repository of
  questions for which the answer is known
  + very little processing necessary
  - limited to the answers in the database
• closed-domain QA systems: are built for very specific domains
  and exploit expert knowledge in them
  + very high accuracy
  - can require extensive language processing and limited to one
  domain
Purpose of the presentation




• briefly present the QALL-ME project
Purpose of the presentation




• briefly present the QALL-ME project
• show how it was adapted to answer questions in Romanian
  about movies
Structure of the presentation



1 Introduction


2 The QALL-ME project


3 Multilingual information access in QALL-ME


4 Conclusions
The QALL-ME project


• QALL-ME = Question Answering Learning technologies in a
  multiLingual and Multimodal Environment
• EU-funded project part of FP6
• 7 partners:
    • FBK-irst, Italy
    • University of Wolverhampton, UK
    • University of Alicante, Spain
    • DFKI, Germany
    • Comdata, Italy
    • UbiEST, Italy
    • WayCom, Italy
• Web page: http://qallme.fbk.eu
The QALL-ME project



• aimed at establishing a shared infrastructure for multilingual
  and multimodal QA in the domain of tourism
• In the QALL-ME system
     • users ask natural language questions in several languages (both
       in textual and speech modality) using a variety of input devices
       (e.g. mobile phones), and
     • returns a list of specific answers formatted in the most
       appropriate modality, ranging from small texts, maps, videos,
       and pictures.
Local Information      Semantic 
     Sources         representation




                                                  Service Provider
                          English Answer                                       German Answer 
                            Extractor                                             Extractor

                                                  QALL­ME central 
                                                    QA planner


                         Spanish Answer                                        Italian Answer 
                            Extractor                                             Extractor




                     Question Type          Answer Type            Speech            Dialog Models
                       ontology              ontology            Recognizers
Main outputs of the project




  • an ontology for the domain of tourism
  • entailment based QA framework
  • the QALL-ME benchmark
  • an entailment framework

(all accessible from the project’s web page:
http://qallme.fbk.eu)
The ontology



• A domain-specific ontology for the tourism domain was
  developed and shared among all the partners.
• The ontology was used to serve as:
    • bridge between different languages
    • communication language between different components of the
      system
• The ontology was linked to domain independent ontologies
  such as MultiWordNet and Sumo
• For more information see (Ou et al., 2008)
Design of the ontology



• Analysis of data from content providers
• Analysis of users requirements
• Inspired by similar ontologies:
     • Harmonise and eTourism: focus on static information (e.g.
       accommodation and events/activities)
     • Similar to eTourism as is written in OWL rather RDFs
     • but wider coverage
• Introspection
The ontology



• Main classes: Country, Destination, Site (i.e.
  Accommodation, Attraction, Gastro, and Infrastructure),
  Transportation, EventContent and Event
• Element classes: Facility, Room, PersonOrganization,
  Language, and Currency
• Attribute classes: Contact, Location, Period and Price.

• Element and attribute classes cannot exist independently and
  have to be attached to other main or element classes
Price                                               Site
                                                                                                                                              GPSCoordinate
priceType
                                                                                                                  hasGPSCoordinate
                              subClassOf                                          subClassOf
                                                                                                                                              PostalAddress
priceValue                                              Event                                                      hasPostalAddress

                              TicketPrice                                          Cinema
                                                                                                                    DirectionLocation
             hasCurrency                            subClassOf                                                                               DirectionLocation
Currency                                                        isInSite
                                                hasPrice
                                                                                                                              hasContact
                                                                           name             description
                                                                                                                                                Contact
                                                                                                            hasSiteFacility
                                                    MovieShow                                                                  hasRoom


                                                                                                                                               CinemaRoom
                                                                                                                    SiteFacility
                               Period                                        EventContent
                                                                                                                                       hasRoomFacility
endTime      startTime                      hasPeriod
                                                                 hasEventContent                                   RoomFacility

                              subClassOf                                          subClassOf
     TimePeriod

                                                                                                                                                 Director
              hasTimePeriod
                                                                                                                   hasDirector
                           DateTimePeriod                                           Movie                          hasProducer                   Producer
              hasDatePeriod                                                                                         hasStar

     DatePeriod                                                                                                     hasWriter                      Star
                                                                name                                certificate



endDate       startDate                                                synposis             genre                                                 Writer
The ontology


• Encoded using OWL DL, since it has more expressive power
  than OWL Lite and has more efficient reasoning support than
  OWL Full
• Used Protege-OWL as the editor and RacerPro7 as the
  reasoner
• The ontology contains
    • 122 classes (concepts),
    • 55 datatype properties and
    • 52 object properties which indicate the relationships among
      the 122 classes.
    • 15 top-level classes.
• The class hierarchy has a maximum depth of 4.
The QALL-ME framework



• is an architecture skeleton for multilingual QA systems for
  closed domains
• designed in such a way that it allows fast development of
  closed domain QA systems
• freely available from http://qallme.sourceforge.net/
• is based on a Service Oriented Architecture (SOA) which is
  realised using web services
• relies on textual entailment recognisers
Web services
1   Context providers: are used to anchor questions in space
    and time
2   Annotators: Currently three types of annotators are
    available:
      • named entity annotators which identify names of cinemas,
        movies, persons, etc.
      • term annotators which identify hotel facilities, movie genres
        and other domain-specific terminology
      • temporal annotators that are used to recognise and normalise
        temporal expressions in user questions
3   Entailment engine: determines whether a user question
    entails a retrieval procedure
4   Query generator: which relies on an entailment engine to
    generate a query to extract the answer.
5   Answer pool: retrieves the answers from a database.
Context providers



• are used to anchor a question in space and time
• return the current position and time
• used by the presentation module when maps are displayed
• used by temporal process to normalise temporal entities
• determines which services are used in a cross-lingual scenario
• can be static or determined from a mobile phone
Named entity and term annotators

• named entity recogniser = identifies names of hotels, movies,
  persons, etc.
• term annotator = identifies domain specific terms such as
  hotel facilities, movie genres, etc.
• the entities and terms are known, so the task is reduced to a
  database look up
• Gazetteers are the main source for determining the entities
• The annotation module needs to determine the canonical form
  of a entity
• greedy algorithm that uses character based similarity, a
  modified TF*IDF and a greedy algorithm
• does not allow overlapping and there are few ambiguities
Named entity and term annotators


• Annotates both standard and non-standard entities: cinema,
  movie, location, genre, certificate
• Needs to deal with nosy input:
    • misspelt words/input from ASR engines/SMS input e.g.
       becaming Jane, becoming Jade
    • free word order (Will Smith / Smith, Will)
    • equivalent strings (saw III / three / 3; Smith, Will / Smith,
       W.)
• Needs to deal with questions in mixed languages
• Needs to deal with ambiguous entities
Temporal annotator


• questions from the domain of tourism contain a large number
  of temporal expressions
• we use a simplified version of the tagger implemented by
  Pu¸ca¸u (2004)
    s s
• the simplification was done to reduce the processing time
  (Varga, Pu¸ca¸u, and Or˘san, 2009)
            s s          a
• identifies both self-contained temporal expressions (TEs) and
  indexical/under-specified TEs
• uses TIMEX2 standard
• the output is used by TIMEX2SPARQL service to restrict the
  extracted answers
Entailment engine

• often closed-domain QA systems transform a question to a
  Prolog fact or SQL query
• often this solution works only partially due to language
  variability
• in QALL-ME this problem is solved using textual entailment
• the entailment engine determines whether two questions entail
  the same meaning so they share the same retrieval procedure:
    • T the input question
    • H is textual pattern stored in a repository
    • textual patterns have SPARQL retrieval procedures
• we calculate the similarity between two sentences to determine
  whether between them there is an entailment relation
Query generation service



• produces a SPARQL query that can be used to answer the
  question
• has a list of question templates with their associated SPARQL
  queries
• relies on the entailment engine to determine which of the
  question patterns entail the same meaning as the user
  question
• fills in the slots of the question patterns
Example

User question (T): What movie can I see tonight in
Wolverhampton?


List of patterns (H):
  • Who is the director of [MOVIE]?
  • Where can I see [MOVIE] [TIMEX]?
  • What movies are on in [DESTINATION] [TIMEX]?
  • What is the address of [CINEMA]?
  • ...
Example
User question (T): What movie can I see tonight in
Wolverhampton? → What movie can I see [TIMEX] in
[DESTINATION]?


List of patterns (H):
  • Who is the director of [MOVIE]?
  • Where can I see [MOVIE] [TIMEX]?
  • What movies are on in [DESTINATION] [TIMEX]?
  • What is the address of [CINEMA]?
  • ...



Select the retrieval pattern associated with the question
What movies are on in Wolverhampton tonight
Answer Pool service




• takes the SPARQL query generated by the query generator
  and extracts the answer
• SPARQL is a query language for accessing RDF graphs by the
  W3C RDF Data Access Working Group
• SPARQL provides interoperability between languages
Structure of the presentation



1 Introduction


2 The QALL-ME project


3 Multilingual information access in QALL-ME


4 Conclusions
Cross-lingual QA




• QALL-ME tourism prototype is design to allow both
  monolingual and cross-lingual QA
• relevant web services are activated depending on the source
  and target language
• user scenario: Romanian tourist in UK who wants to find out
  more about the movies in Wolverhampton
Cross-lingual QA
Prototype for Romanian


• we wanted to find out how long it takes to develop a demo for
  Romanian
• components had to be adapted:
    • named entity and term annotators had to be trained on a
      different list of entities
    • a simple temporal annotator was implemented on the basis of
      the English one
    • the language independent similarity entailment engine was used
    • the question patterns were translated to Romanian
    • answer pool did not required any change
• the whole process took under one week
Romanian demo




http://qallme.wlv.ac.uk:
8080/QALL-ME-web-demo/index.jsp
Structure of the presentation



1 Introduction


2 The QALL-ME project


3 Multilingual information access in QALL-ME


4 Conclusions
Conclusions




• multilinguality is a very important issue for the QALL-ME
  project
• the ontology constitute the bridge between languages
• the QALL-ME framework can be used to quickly develop
  prototypes for other languages
Thank you!
References
Ou, Shiyan, Viktor Pekar, Constantin Or˘san, Christian Spurk, and Matteo Negri.
                                        a
2008. Development and alignment of a domain-specific ontology for question
answering. In European Language Resources Association (ELRA), editor, Proceedings
of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech,
Morocco, May 28 – 30.
Pu¸ca¸u, Georgiana. 2004. A framework for temporal resolution. In Proceedings of
   s s
the 4th Conference on Language Resources and Evaluation (LREC 2004), Lisbon,
Portugal, May, 26-28.
Varga, Andrea, Georgiana Pu¸ca¸u, and Constantin Or˘san. 2009. Identification of
                             s s                     a
temporal expressions in the domain of tourism. In Knowledge Engineering: Principles
and Techniques, volume 1, pages 29 – 32, Cluj-Napoca, Romania, July 2 – 4.

More Related Content

PDF
SOFIA - Smart M3. NOKIA
PPT
Yusen Logistics St Louis Missouri
PPT
Harsh stock market
PPTX
Dallas SMC Presentation
PDF
Bio - Jean Fares Couture
PPT
Gather - a wide range of vegan dishes that excited and pleased the taste buds!
PDF
Evolving Trends and Fashion in Egypt with Jean Fares
PPTX
The broiler hen
SOFIA - Smart M3. NOKIA
Yusen Logistics St Louis Missouri
Harsh stock market
Dallas SMC Presentation
Bio - Jean Fares Couture
Gather - a wide range of vegan dishes that excited and pleased the taste buds!
Evolving Trends and Fashion in Egypt with Jean Fares
The broiler hen

Viewers also liked (19)

PPT
Milieu
PPT
Milieu
PPT
Fond memories of Zanzibar
PDF
Developing Cocoa Applications with macRuby
PPTX
Linkedin power point
PDF
Lecture 02 - DSA
PDF
Jean Fares Couture BIO
PPT
Linked In Presentation
PPT
Iso dinkes
PPT
Kansas sights
PPTX
IOS-Basic Configuration
PPT
Software Testing Services
PPT
Art Mini Portfolio
PPT
Prem Ni Parab
PPT
Subtraction problem
PPS
24 Tirthankaras
PPS
Interview with Warren Buffet
PPT
Fear Factor with Outsourcing
PPT
Way out cafe - amazing vegan desserts!
Milieu
Milieu
Fond memories of Zanzibar
Developing Cocoa Applications with macRuby
Linkedin power point
Lecture 02 - DSA
Jean Fares Couture BIO
Linked In Presentation
Iso dinkes
Kansas sights
IOS-Basic Configuration
Software Testing Services
Art Mini Portfolio
Prem Ni Parab
Subtraction problem
24 Tirthankaras
Interview with Warren Buffet
Fear Factor with Outsourcing
Way out cafe - amazing vegan desserts!
Ad

Similar to Porting the QALL-ME framework to Romanian (20)

PDF
Applying Semantic Extensions And New Services To Drupal Sem Tech June 2010
PDF
Cs forum2012 makinge_bayhelphelpfulagain_luciehyde
PDF
My fire st petersburg 27 june 2012 (d hladky)
PDF
We Know It (Newsfromthefront 2010)
PDF
Alla ricerca della user story perduta
PDF
Alla ricerca della User Story perduta
PDF
2011 Search Query Rewrites - Synonyms & Acronyms
KEY
JISC CNI Meeting, Edinburgh 2010
PDF
iDiscover: Towards the next generation of contextualised mobile museum guides
PPT
Search, APIs, capability management and Sensis's journey
PDF
Not venturini enter_2013
PDF
"Search, APIs,Capability Management and the Sensis Journey"
KEY
Talking to your IDE
PDF
Restful User Experience
PPT
Sematic SOA Governance Demo screen shots
PPT
3D context-aware mobile maps for tourism - ENTER2011 PhD Workshop
PDF
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
PDF
A Provenance-Aware Linked Data Application for Trip Management and Organization
KEY
Collection Ranking and Selection for Federated Entity Search
PDF
Eswc2012 ss ontologies
Applying Semantic Extensions And New Services To Drupal Sem Tech June 2010
Cs forum2012 makinge_bayhelphelpfulagain_luciehyde
My fire st petersburg 27 june 2012 (d hladky)
We Know It (Newsfromthefront 2010)
Alla ricerca della user story perduta
Alla ricerca della User Story perduta
2011 Search Query Rewrites - Synonyms & Acronyms
JISC CNI Meeting, Edinburgh 2010
iDiscover: Towards the next generation of contextualised mobile museum guides
Search, APIs, capability management and Sensis's journey
Not venturini enter_2013
"Search, APIs,Capability Management and the Sensis Journey"
Talking to your IDE
Restful User Experience
Sematic SOA Governance Demo screen shots
3D context-aware mobile maps for tourism - ENTER2011 PhD Workshop
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
A Provenance-Aware Linked Data Application for Trip Management and Organization
Collection Ranking and Selection for Federated Entity Search
Eswc2012 ss ontologies
Ad

More from Constantin Orasan (9)

PDF
Presentation of the metadiscourse annotator
PPTX
New trends in NLP applications
PPTX
From TREC to Watson: is open domain question answering a solved problem?
PPT
QALL-ME: Ontology and Semantic Web
PPT
The role of linguistic information for shallow language processing
PPT
What is Computer-Aided Summarisation and does it really work?
PDF
Tutorial on automatic summarization
PDF
Message project leaflet
PDF
Annotation of anaphora and coreference for automatic processing
Presentation of the metadiscourse annotator
New trends in NLP applications
From TREC to Watson: is open domain question answering a solved problem?
QALL-ME: Ontology and Semantic Web
The role of linguistic information for shallow language processing
What is Computer-Aided Summarisation and does it really work?
Tutorial on automatic summarization
Message project leaflet
Annotation of anaphora and coreference for automatic processing

Recently uploaded (20)

PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Basic Mud Logging Guide for educational purpose
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Pre independence Education in Inndia.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Complications of Minimal Access Surgery at WLH
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Insiders guide to clinical Medicine.pdf
PDF
Classroom Observation Tools for Teachers
Anesthesia in Laparoscopic Surgery in India
GDM (1) (1).pptx small presentation for students
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
O5-L3 Freight Transport Ops (International) V1.pdf
Microbial disease of the cardiovascular and lymphatic systems
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPH.pptx obstetrics and gynecology in nursing
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Basic Mud Logging Guide for educational purpose
human mycosis Human fungal infections are called human mycosis..pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Pre independence Education in Inndia.pdf
Microbial diseases, their pathogenesis and prophylaxis
Renaissance Architecture: A Journey from Faith to Humanism
Complications of Minimal Access Surgery at WLH
Abdominal Access Techniques with Prof. Dr. R K Mishra
Insiders guide to clinical Medicine.pdf
Classroom Observation Tools for Teachers

Porting the QALL-ME framework to Romanian

  • 1. Porting the QALL-ME framework to Romanian Constantin Or˘san a Research Group in Computational Linguistics Research Institute in Information and Language Processing University of Wolverhampton http://www.wlv.ac.uk/~in6093/ 29th March 2010
  • 2. 1 Introduction 2 The QALL-ME project 3 Multilingual information access in QALL-ME 4 Conclusions
  • 3. Structure of the presentation 1 Introduction 2 The QALL-ME project 3 Multilingual information access in QALL-ME 4 Conclusions
  • 4. Need to access information • as a result of the Internet development more and more information becomes available • this information is in many languages • fields from computational linguistics such as automatic summarisation, question answering, text mining, etc. can help people deal with information
  • 5. Need to access information • as a result of the Internet development more and more information becomes available • this information is in many languages • fields from computational linguistics such as automatic summarisation, question answering, text mining, etc. can help people deal with information
  • 6. Question answering (QA) • Question answering aims at identifying the answer to a question in a large collection of documents • the information provided by QA is more focused than information retrieval • the output can be the exact answer or a text snippet which contains the answer • the domain took off as a result of the introduction of QA track in TREC, whilst cross-lingual QA as a result of CLEF
  • 7. Types of QA systems • open-domain QA systems: can answer any question from any collection + can potentially answer any question - very low accuracy (especially in cross-lingual settings)
  • 8. Types of QA systems • open-domain QA systems: can answer any question from any collection + can potentially answer any question - very low accuracy (especially in cross-lingual settings) • canned QA systems: rely on a very large repository of questions for which the answer is known + very little processing necessary - limited to the answers in the database
  • 9. Types of QA systems • open-domain QA systems: can answer any question from any collection + can potentially answer any question - very low accuracy (especially in cross-lingual settings) • canned QA systems: rely on a very large repository of questions for which the answer is known + very little processing necessary - limited to the answers in the database • closed-domain QA systems: are built for very specific domains and exploit expert knowledge in them + very high accuracy - can require extensive language processing and limited to one domain
  • 10. Purpose of the presentation • briefly present the QALL-ME project
  • 11. Purpose of the presentation • briefly present the QALL-ME project • show how it was adapted to answer questions in Romanian about movies
  • 12. Structure of the presentation 1 Introduction 2 The QALL-ME project 3 Multilingual information access in QALL-ME 4 Conclusions
  • 13. The QALL-ME project • QALL-ME = Question Answering Learning technologies in a multiLingual and Multimodal Environment • EU-funded project part of FP6 • 7 partners: • FBK-irst, Italy • University of Wolverhampton, UK • University of Alicante, Spain • DFKI, Germany • Comdata, Italy • UbiEST, Italy • WayCom, Italy • Web page: http://qallme.fbk.eu
  • 14. The QALL-ME project • aimed at establishing a shared infrastructure for multilingual and multimodal QA in the domain of tourism • In the QALL-ME system • users ask natural language questions in several languages (both in textual and speech modality) using a variety of input devices (e.g. mobile phones), and • returns a list of specific answers formatted in the most appropriate modality, ranging from small texts, maps, videos, and pictures.
  • 15. Local Information  Semantic  Sources representation Service Provider English Answer  German Answer  Extractor Extractor QALL­ME central  QA planner Spanish Answer  Italian Answer  Extractor Extractor Question Type  Answer Type  Speech  Dialog Models ontology ontology Recognizers
  • 16. Main outputs of the project • an ontology for the domain of tourism • entailment based QA framework • the QALL-ME benchmark • an entailment framework (all accessible from the project’s web page: http://qallme.fbk.eu)
  • 17. The ontology • A domain-specific ontology for the tourism domain was developed and shared among all the partners. • The ontology was used to serve as: • bridge between different languages • communication language between different components of the system • The ontology was linked to domain independent ontologies such as MultiWordNet and Sumo • For more information see (Ou et al., 2008)
  • 18. Design of the ontology • Analysis of data from content providers • Analysis of users requirements • Inspired by similar ontologies: • Harmonise and eTourism: focus on static information (e.g. accommodation and events/activities) • Similar to eTourism as is written in OWL rather RDFs • but wider coverage • Introspection
  • 19. The ontology • Main classes: Country, Destination, Site (i.e. Accommodation, Attraction, Gastro, and Infrastructure), Transportation, EventContent and Event • Element classes: Facility, Room, PersonOrganization, Language, and Currency • Attribute classes: Contact, Location, Period and Price. • Element and attribute classes cannot exist independently and have to be attached to other main or element classes
  • 20. Price Site GPSCoordinate priceType hasGPSCoordinate subClassOf subClassOf PostalAddress priceValue Event hasPostalAddress TicketPrice Cinema DirectionLocation hasCurrency subClassOf DirectionLocation Currency isInSite hasPrice hasContact name description Contact hasSiteFacility MovieShow hasRoom CinemaRoom SiteFacility Period EventContent hasRoomFacility endTime startTime hasPeriod hasEventContent RoomFacility subClassOf subClassOf TimePeriod Director hasTimePeriod hasDirector DateTimePeriod Movie hasProducer Producer hasDatePeriod hasStar DatePeriod hasWriter Star name certificate endDate startDate synposis genre Writer
  • 21. The ontology • Encoded using OWL DL, since it has more expressive power than OWL Lite and has more efficient reasoning support than OWL Full • Used Protege-OWL as the editor and RacerPro7 as the reasoner • The ontology contains • 122 classes (concepts), • 55 datatype properties and • 52 object properties which indicate the relationships among the 122 classes. • 15 top-level classes. • The class hierarchy has a maximum depth of 4.
  • 22. The QALL-ME framework • is an architecture skeleton for multilingual QA systems for closed domains • designed in such a way that it allows fast development of closed domain QA systems • freely available from http://qallme.sourceforge.net/ • is based on a Service Oriented Architecture (SOA) which is realised using web services • relies on textual entailment recognisers
  • 23. Web services 1 Context providers: are used to anchor questions in space and time 2 Annotators: Currently three types of annotators are available: • named entity annotators which identify names of cinemas, movies, persons, etc. • term annotators which identify hotel facilities, movie genres and other domain-specific terminology • temporal annotators that are used to recognise and normalise temporal expressions in user questions 3 Entailment engine: determines whether a user question entails a retrieval procedure 4 Query generator: which relies on an entailment engine to generate a query to extract the answer. 5 Answer pool: retrieves the answers from a database.
  • 24. Context providers • are used to anchor a question in space and time • return the current position and time • used by the presentation module when maps are displayed • used by temporal process to normalise temporal entities • determines which services are used in a cross-lingual scenario • can be static or determined from a mobile phone
  • 25. Named entity and term annotators • named entity recogniser = identifies names of hotels, movies, persons, etc. • term annotator = identifies domain specific terms such as hotel facilities, movie genres, etc. • the entities and terms are known, so the task is reduced to a database look up • Gazetteers are the main source for determining the entities • The annotation module needs to determine the canonical form of a entity • greedy algorithm that uses character based similarity, a modified TF*IDF and a greedy algorithm • does not allow overlapping and there are few ambiguities
  • 26. Named entity and term annotators • Annotates both standard and non-standard entities: cinema, movie, location, genre, certificate • Needs to deal with nosy input: • misspelt words/input from ASR engines/SMS input e.g. becaming Jane, becoming Jade • free word order (Will Smith / Smith, Will) • equivalent strings (saw III / three / 3; Smith, Will / Smith, W.) • Needs to deal with questions in mixed languages • Needs to deal with ambiguous entities
  • 27. Temporal annotator • questions from the domain of tourism contain a large number of temporal expressions • we use a simplified version of the tagger implemented by Pu¸ca¸u (2004) s s • the simplification was done to reduce the processing time (Varga, Pu¸ca¸u, and Or˘san, 2009) s s a • identifies both self-contained temporal expressions (TEs) and indexical/under-specified TEs • uses TIMEX2 standard • the output is used by TIMEX2SPARQL service to restrict the extracted answers
  • 28. Entailment engine • often closed-domain QA systems transform a question to a Prolog fact or SQL query • often this solution works only partially due to language variability • in QALL-ME this problem is solved using textual entailment • the entailment engine determines whether two questions entail the same meaning so they share the same retrieval procedure: • T the input question • H is textual pattern stored in a repository • textual patterns have SPARQL retrieval procedures • we calculate the similarity between two sentences to determine whether between them there is an entailment relation
  • 29. Query generation service • produces a SPARQL query that can be used to answer the question • has a list of question templates with their associated SPARQL queries • relies on the entailment engine to determine which of the question patterns entail the same meaning as the user question • fills in the slots of the question patterns
  • 30. Example User question (T): What movie can I see tonight in Wolverhampton? List of patterns (H): • Who is the director of [MOVIE]? • Where can I see [MOVIE] [TIMEX]? • What movies are on in [DESTINATION] [TIMEX]? • What is the address of [CINEMA]? • ...
  • 31. Example User question (T): What movie can I see tonight in Wolverhampton? → What movie can I see [TIMEX] in [DESTINATION]? List of patterns (H): • Who is the director of [MOVIE]? • Where can I see [MOVIE] [TIMEX]? • What movies are on in [DESTINATION] [TIMEX]? • What is the address of [CINEMA]? • ... Select the retrieval pattern associated with the question What movies are on in Wolverhampton tonight
  • 32. Answer Pool service • takes the SPARQL query generated by the query generator and extracts the answer • SPARQL is a query language for accessing RDF graphs by the W3C RDF Data Access Working Group • SPARQL provides interoperability between languages
  • 33. Structure of the presentation 1 Introduction 2 The QALL-ME project 3 Multilingual information access in QALL-ME 4 Conclusions
  • 34. Cross-lingual QA • QALL-ME tourism prototype is design to allow both monolingual and cross-lingual QA • relevant web services are activated depending on the source and target language • user scenario: Romanian tourist in UK who wants to find out more about the movies in Wolverhampton
  • 36. Prototype for Romanian • we wanted to find out how long it takes to develop a demo for Romanian • components had to be adapted: • named entity and term annotators had to be trained on a different list of entities • a simple temporal annotator was implemented on the basis of the English one • the language independent similarity entailment engine was used • the question patterns were translated to Romanian • answer pool did not required any change • the whole process took under one week
  • 38. Structure of the presentation 1 Introduction 2 The QALL-ME project 3 Multilingual information access in QALL-ME 4 Conclusions
  • 39. Conclusions • multilinguality is a very important issue for the QALL-ME project • the ontology constitute the bridge between languages • the QALL-ME framework can be used to quickly develop prototypes for other languages
  • 42. Ou, Shiyan, Viktor Pekar, Constantin Or˘san, Christian Spurk, and Matteo Negri. a 2008. Development and alignment of a domain-specific ontology for question answering. In European Language Resources Association (ELRA), editor, Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech, Morocco, May 28 – 30. Pu¸ca¸u, Georgiana. 2004. A framework for temporal resolution. In Proceedings of s s the 4th Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal, May, 26-28. Varga, Andrea, Georgiana Pu¸ca¸u, and Constantin Or˘san. 2009. Identification of s s a temporal expressions in the domain of tourism. In Knowledge Engineering: Principles and Techniques, volume 1, pages 29 – 32, Cluj-Napoca, Romania, July 2 – 4.