SlideShare a Scribd company logo
Towards the implementation of a refined data model for a zulu machine-readable lexiconRonell van der MerweLaurette PretoriusSonja Bosch1
Goal ... to develop ....a complete data repository (database)Bantu languagesMachine Readable (MR) lexiconapplications: finite state morphological analyser, ....2
What we will be discussing ....Brief overviewLexicon update frameworkData modelImplementation approachesConclusion Future work3
1. Introduction           - Comprehensive MR lexiconUpdate frameworkZULU corporaPaper Dict4
2. Lexicon Update FrameworkMorphological AnalyserZulMorphCorpusapplyfailuressuccessesrebuildGuesserMR lexiconupdatenew stems/roots5
Purpose of MR LexiconMorphologicalAnalyserAppsZulMorphMR lexiconusedRepository of all lexical informationre-usedApps6Morphosyntactic information:Zulu morphotacticsMorphophonological alternation rulesEmbedded stem/root lexicon
Guesser variantPhonologically possible stems /rootsGuesseridentifynew     candidate(s)MR lexiconupdate / include7
3. Data Model - development8*      0 or more+     1 or more|     optional        leaf node
3.1 Verbal extensionsSuffixing extensions to verb root:AppliedCausativeIntensiveNeuterPassiveReciprocal9*      0 or more+     1 or more|     optional        leaf node
10
11
12
3.2 Deverbativesverb / extendedverb rootnoun class prefixdeverbative suffix (-o- | -i- |-a-| -e-| -u-)13optional:nominal suffix(Aug | Dim | Loc | Fem)
14
15
4. Towards implementation16Morphological AnalyserZulMorphMR lexiconThe capturing of the data must be:rigoroussystematicappropriately structuredTo ensure that:data exchange is consistent
Considerations in the choice of database...type of datadifferent views of datadifferent types of access to data17
Consideration: Type of Datasemi-structuredallows for recursionn-depth structures 18
Consideration: views and accessRebuilding the Morphological Analyser:view:  morphological structure of all word roots/stemsaccess:  sequential19
XML – enabled database20Object-Oriented DatabaseRelational Database<root>bon</root><verbfeatures>.......</verbfeatures>
Native XML database (NXD)21Native XMLDBXQueryXUpdate<root>bon</root><verbfeatures>.......</verbfeatures>
XML enabled database:	Object-Oriented22
NXD and OO Database satisfy the implementation considerations:Both ...are suitable for our type of data: semi-structured , recursivesatisfy the type of access and view: sequential access to morphological information of all word roots/stems 23
ConclusionRefinement of a data model for the Zulu MR lexiconverbal extensions and deverbativesMR Lexicon embedded in an update frameworkNative XML and OO Databases possible approaches to implementation24
Future workDevelopment and evaluation of prototypes for the MR Lexicon (Zulu)Software for semi-automating the lexicon update frameworkBootstrapping of MR Lexicon for other Bantu languages25

More Related Content

PPT
Semantic Search Component
PPTX
Deductive Databases Presentation
PDF
SemFacet paper
PPT
Tesxt mining
PDF
CONSIDERING STRUCTURAL AND VOCABULARY HETEROGENEITY IN XML QUERY: FPTPQ AND H...
PPTX
Protein structure
PDF
Redundancy analysis on linked data #cold2014 #ISWC2014
PPT
Introduction to persistency and Berkeley DB
Semantic Search Component
Deductive Databases Presentation
SemFacet paper
Tesxt mining
CONSIDERING STRUCTURAL AND VOCABULARY HETEROGENEITY IN XML QUERY: FPTPQ AND H...
Protein structure
Redundancy analysis on linked data #cold2014 #ISWC2014
Introduction to persistency and Berkeley DB

What's hot (19)

PPTX
Text mining
DOCX
Lesson plan proforma database management system
PPTX
Open Data Mashups: linking fragments into mosaics
PPT
Boolean Retrieval
PDF
Improving Document Clustering by Eliminating Unnatural Language
PPT
Role of Text Mining in Search Engine
PPT
Week12
DOCX
Union from C and Data Strutures
PPSX
Presentation Elpub 2013
PDF
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
PPT
Big Data & Text Mining
PPT
Basics
PDF
A Mathematical Approach to Ontology Authoring and Documentation
PDF
Some Information Retrieval Models and Our Experiments for TREC KBA
PPTX
Seaform Slides in VLDB 2010 PhD Workshop
ODP
Corpora, Blogs and Linguistic Variation (Paderborn)
PPTX
Textmining Information Extraction
PPTX
Primary and secondary databases ppt by puneet kulyana
PPTX
Introduction of structure (2)
Text mining
Lesson plan proforma database management system
Open Data Mashups: linking fragments into mosaics
Boolean Retrieval
Improving Document Clustering by Eliminating Unnatural Language
Role of Text Mining in Search Engine
Week12
Union from C and Data Strutures
Presentation Elpub 2013
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Big Data & Text Mining
Basics
A Mathematical Approach to Ontology Authoring and Documentation
Some Information Retrieval Models and Our Experiments for TREC KBA
Seaform Slides in VLDB 2010 PhD Workshop
Corpora, Blogs and Linguistic Variation (Paderborn)
Textmining Information Extraction
Primary and secondary databases ppt by puneet kulyana
Introduction of structure (2)
Ad

Viewers also liked (6)

PDF
NLP Data Cleansing Based on Linguistic Ontology Constraints
PPTX
Sensors, Wearables and the Internet of Things: A Revolution in the Making
PDF
Hardware Startups: The VC Perspective
PDF
Big data landscape v 3.0 - Matt Turck (FirstMark)
PPTX
The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)
PDF
Building an AI Startup: Realities & Tactics
NLP Data Cleansing Based on Linguistic Ontology Constraints
Sensors, Wearables and the Internet of Things: A Revolution in the Making
Hardware Startups: The VC Perspective
Big data landscape v 3.0 - Matt Turck (FirstMark)
The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)
Building an AI Startup: Realities & Tactics
Ad

More from Guy De Pauw (20)

PDF
Technological Tools for Dictionary and Corpora Building for Minority Language...
PDF
Semi-automated extraction of morphological grammars for Nguni with special re...
PPTX
Resource-Light Bantu Part-of-Speech Tagging
PDF
Natural Language Processing for Amazigh Language
PDF
POS Annotated 50m Corpus of Tajik Language
PDF
The Tagged Icelandic Corpus (MÍM)
PDF
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
PDF
Tagging and Verifying an Amharic News Corpus
PDF
A Corpus of Santome
PDF
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
PDF
Compiling Apertium Dictionaries with HFST
PDF
The Database of Modern Icelandic Inflection
PDF
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
PPT
Issues in Designing a Corpus of Spoken Irish
PDF
How to build language technology resources for the next 100 years
PPT
Towards Standardizing Evaluation Test Sets for Compound Analysers
PPT
The PALDO Concept - New Paradigms for African Language Resource Development
PPT
A System for the Recognition of Handwritten Yorùbá Characters
PPTX
IFE-MT: An English-to-Yorùbá Machine Translation System
PDF
A Number to Yorùbá Text Transcription System
Technological Tools for Dictionary and Corpora Building for Minority Language...
Semi-automated extraction of morphological grammars for Nguni with special re...
Resource-Light Bantu Part-of-Speech Tagging
Natural Language Processing for Amazigh Language
POS Annotated 50m Corpus of Tajik Language
The Tagged Icelandic Corpus (MÍM)
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Tagging and Verifying an Amharic News Corpus
A Corpus of Santome
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Compiling Apertium Dictionaries with HFST
The Database of Modern Icelandic Inflection
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Issues in Designing a Corpus of Spoken Irish
How to build language technology resources for the next 100 years
Towards Standardizing Evaluation Test Sets for Compound Analysers
The PALDO Concept - New Paradigms for African Language Resource Development
A System for the Recognition of Handwritten Yorùbá Characters
IFE-MT: An English-to-Yorùbá Machine Translation System
A Number to Yorùbá Text Transcription System

Recently uploaded (20)

PPTX
Spectroscopy.pptx food analysis technology
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Machine Learning_overview_presentation.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Approach and Philosophy of On baking technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Encapsulation theory and applications.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
Spectroscopy.pptx food analysis technology
A comparative analysis of optical character recognition models for extracting...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Machine Learning_overview_presentation.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Encapsulation_ Review paper, used for researhc scholars
Approach and Philosophy of On baking technology
20250228 LYD VKU AI Blended-Learning.pptx
Machine learning based COVID-19 study performance prediction
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
MYSQL Presentation for SQL database connectivity
sap open course for s4hana steps from ECC to s4
Encapsulation theory and applications.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Chapter 3 Spatial Domain Image Processing.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Programs and apps: productivity, graphics, security and other tools

Towards the implementation of a refined data model for a Zulu machine-readable lexicon

  • 1. Towards the implementation of a refined data model for a zulu machine-readable lexiconRonell van der MerweLaurette PretoriusSonja Bosch1
  • 2. Goal ... to develop ....a complete data repository (database)Bantu languagesMachine Readable (MR) lexiconapplications: finite state morphological analyser, ....2
  • 3. What we will be discussing ....Brief overviewLexicon update frameworkData modelImplementation approachesConclusion Future work3
  • 4. 1. Introduction - Comprehensive MR lexiconUpdate frameworkZULU corporaPaper Dict4
  • 5. 2. Lexicon Update FrameworkMorphological AnalyserZulMorphCorpusapplyfailuressuccessesrebuildGuesserMR lexiconupdatenew stems/roots5
  • 6. Purpose of MR LexiconMorphologicalAnalyserAppsZulMorphMR lexiconusedRepository of all lexical informationre-usedApps6Morphosyntactic information:Zulu morphotacticsMorphophonological alternation rulesEmbedded stem/root lexicon
  • 7. Guesser variantPhonologically possible stems /rootsGuesseridentifynew candidate(s)MR lexiconupdate / include7
  • 8. 3. Data Model - development8* 0 or more+ 1 or more| optional leaf node
  • 9. 3.1 Verbal extensionsSuffixing extensions to verb root:AppliedCausativeIntensiveNeuterPassiveReciprocal9* 0 or more+ 1 or more| optional leaf node
  • 10. 10
  • 11. 11
  • 12. 12
  • 13. 3.2 Deverbativesverb / extendedverb rootnoun class prefixdeverbative suffix (-o- | -i- |-a-| -e-| -u-)13optional:nominal suffix(Aug | Dim | Loc | Fem)
  • 14. 14
  • 15. 15
  • 16. 4. Towards implementation16Morphological AnalyserZulMorphMR lexiconThe capturing of the data must be:rigoroussystematicappropriately structuredTo ensure that:data exchange is consistent
  • 17. Considerations in the choice of database...type of datadifferent views of datadifferent types of access to data17
  • 18. Consideration: Type of Datasemi-structuredallows for recursionn-depth structures 18
  • 19. Consideration: views and accessRebuilding the Morphological Analyser:view: morphological structure of all word roots/stemsaccess: sequential19
  • 20. XML – enabled database20Object-Oriented DatabaseRelational Database<root>bon</root><verbfeatures>.......</verbfeatures>
  • 21. Native XML database (NXD)21Native XMLDBXQueryXUpdate<root>bon</root><verbfeatures>.......</verbfeatures>
  • 23. NXD and OO Database satisfy the implementation considerations:Both ...are suitable for our type of data: semi-structured , recursivesatisfy the type of access and view: sequential access to morphological information of all word roots/stems 23
  • 24. ConclusionRefinement of a data model for the Zulu MR lexiconverbal extensions and deverbativesMR Lexicon embedded in an update frameworkNative XML and OO Databases possible approaches to implementation24
  • 25. Future workDevelopment and evaluation of prototypes for the MR Lexicon (Zulu)Software for semi-automating the lexicon update frameworkBootstrapping of MR Lexicon for other Bantu languages25