SlideShare a Scribd company logo
A Combined Method for E-Learning 
Ontology Population based on NLP and 
User Activity Analysis 
Dmitry Mouromtsev, Fedor Kozlov, Liubov Kovriguina and Olga Parkhimovich 
Laboratory ISST @ ITMO University, St. Petersburg, Russia 
Linked Learning meets LinkedUp: Learning and Education with the Web of Data @ ISWC 2014, Italy
Introduction 
● Use semantics to make education materials 
reusable and flexible, 
● Different datasets in one e-learning system, 
● We need to provide tools for tutors to 
improve their courses.
Use students to improve your course!
Goal 
● To develop the ontologies, 
● To convert tests from XML to RDF, 
● To map tests with subject terms via NLP 
algorithms, 
● To gather user’s statistics and implement 
analysis module.
ECOLE: Front-end 
● Ontology-based e-learning 
system, 
● User friendly interface, 
● Based on Django framework.
ECOLE: Back-end 
● Collection of educational materials 
from different open resources 
(Dbpedia, BNB), 
● Analytics tools, 
● Based on the Information Workbench 
platform.
The ontology of education resources 
● Extends AIISO 
ontology for education 
process and structure, 
● Uses BIBO for 
bibliographic 
resources, 
● Uses MA-ONT for 
media resources.
The ontology of tests 
● Developed to describe the content of 
tests, 
● Extends top-level ontology of the 
system, 
● 12 classes, 
● 10 object properties, 
● 6 datatype properties, 
● http://purl.org/ailab/testontology.
The ontology of student activity 
● Developed to store information 
about the student's learning 
process and results, 
● Uses FOAF, 
● Uses the ontology of test, 
● 10 classes, 
● 15 object properties, 
● 5 datatype properties, 
● http://purl.org/ailab/learningresults 
.
Lemmatization 
● Extracts lemmas from subject term labels, 
● Uses NLP procedures and dictionaries to 
generate lemmas, 
● Stores lemmas in triples.
Terms extraction
New System Terms 
● If the extracted word 
sequence doesn't match 
any of the system terms, 
it is included as a new 
system term, 
● New system terms are 
validated via SPARQL 
queries to Dbpedia.
NLP Algorithm 
Dictionary 
Patterns 
Set 
Tokenization 
Morphological 
and Semantic 
Tagging 
Extracting 
Candidate 
Terms 
Producing the 
Canonical Form 
of the Term + 
Lemma(s) 
● Uses Part-of-Speech patterns 
● Grammatical agreement is specified in 
the patterns 
● Words+Lemmas+Morphology+Some 
semantic tags 
● Superlemmas (a common lemma is 
generated for lexical variants) 
● Derivation paradigms (a common 
lemma is generated for words with the 
same root)
Implementation: Test parsers 
● Converts tests of the course 
from XML to RDF, 
● Uses Information Workbench 
XMLProvider to automatically 
update, 
● Describes mapping using 
XPath functions.
Implementation: NLP module 
● Uses dictionaries in NooJ format: 
<LEMMA>+<PART OF SPEECH TAG>+ 
<INFLECTIONAL PARADIGM>+<OTHER ANNOTATIONS> 
air,N+FLX=TABLE 
Michelson,N+ProperName 
● English NooJ resources are reused, Russian lexical 
resources are original, 
● A separate procedure implemented in Python is 
launched for lemmatization and extraction of the terms.
Evaluation and Results
Hidden subject terms 
A ladder is 5m long. How far from the base of a 
wall should it be placed if it is to reach 4m up 
the wall? 
Nothing to extract!!! 
Hidden subject terms: Pythagorean Theorem, Hypotenuse, Сathetus…...
User statistics collection 
● Collects user's activity 
on front-end, 
● Creates triples, 
● Analyzes user's 
answers on tests, 
● Builds rating of the 
terms, which caused 
difficulties for students.
Conclusion 
● The ontologies of tests and student activity 
have been developed, 
● The tasks of the test have been linked with 
system terms, 
● The statistics gathering module has been 
developed, 
● The system provides rating of the terms, 
which caused difficulties for students.
Future Work 
● Improve term extraction procedure by adding parallel 
texts of tasks, 
● Process units of measure in tasks to predict “hidden 
terms”, 
● Use relations between subject terms to improve the 
quality of term extraction procedure, 
● Refine term knowledge rating by replacing it by the 
proper ranking formula.
Thank you!!! 
The front-end of the e-learning system: http://ecole.ifmo.ru 
Example of subject terms analytics for module "Interference and 
Coherence": 
http://openedu.ifmo.ru:8888/resource/Phisics:m_InterferenceAnd 
Coherence?analytic=1 
The source code: https://github.com/ailabitmo/linked-learning-solution

More Related Content

PDF
Data Wrangling Week 4
PDF
Data wrangling week 6
PDF
Data wrangling week1
PPTX
Using Programmed Instruction to Help Students Engage with eTextbook Content
PDF
Data wrangling week2
PDF
Data wrangling week 5
PPTX
Order out of Chaos: Construction of Knowledge Models from PDF Textbooks
PDF
Data wrangling week 9
Data Wrangling Week 4
Data wrangling week 6
Data wrangling week1
Using Programmed Instruction to Help Students Engage with eTextbook Content
Data wrangling week2
Data wrangling week 5
Order out of Chaos: Construction of Knowledge Models from PDF Textbooks
Data wrangling week 9

What's hot (8)

PDF
Data wrangling week3
PDF
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
PDF
Data wrangling week 11
PPTX
Contextual Definition Generation
PPTX
Mathematical Language Processing via Tree Embeddings
PDF
Integrating Textbooks with Smart Interactive Content for Learning Programming
PDF
Data Wrangling Week 7
PPTX
05 distance learning standards-scorm research
Data wrangling week3
[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection
Data wrangling week 11
Contextual Definition Generation
Mathematical Language Processing via Tree Embeddings
Integrating Textbooks with Smart Interactive Content for Learning Programming
Data Wrangling Week 7
05 distance learning standards-scorm research
Ad

Viewers also liked (9)

PPTX
Ontology and Ontology Libraries: a critical study
PPTX
Ontology
PDF
Modelling the development of practice: Views of the archive through ontologies
PDF
Ontologia1
PPTX
Fuzzy Logic Based Edge Detection
PPTX
Java and OWL
PDF
Introduction to natural language processing
PDF
Lecture: Ontologies and the Semantic Web
PDF
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
Ontology and Ontology Libraries: a critical study
Ontology
Modelling the development of practice: Views of the archive through ontologies
Ontologia1
Fuzzy Logic Based Edge Detection
Java and OWL
Introduction to natural language processing
Lecture: Ontologies and the Semantic Web
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
Ad

Similar to A Combined Method for E-Learning Ontology Population based on NLP and User Activity Analysis (20)

PDF
AN AUTOMATED MULTIPLE-CHOICE QUESTION GENERATION USING NATURAL LANGUAGE PROCE...
PDF
xAPI design cohort - Team xAPI profilers project briefing and future plan
PPTX
A Tool to Convert Linked Data of E-Learning System to the SCORM Standard
PDF
Orchestration Graphs: Enabling Rich Learning Scenarios at Scale
PPT
JISC LADIE project Learning Design In Education
PPT
A Study on the Processes of OER Integration in Course Development
PDF
NLP in Web Data Extraction (Omer Gunes)
PDF
Content Wizard: Concept-Based Recommender System for Instructors of Programmi...
PPT
World Summit on the Knowledge Society 2009
PDF
Natural language processing for requirements engineering: ICSE 2021 Technical...
PDF
LODFlow: Workflow Management System for Linked Data Processing
PDF
Human-AI Co-Creation of Worked Examples for Programming Classes
PPTX
Exploring the Content Ecosystem of the First Open-source Adaptive Tutor and i...
PDF
Pragmatic software testing education - SIGCSE 2019
PDF
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
PDF
Class Diagram Extraction from Textual Requirements Using NLP Techniques
PDF
D017232729
PDF
Preaty tefa 2013
PPTX
Differentiation with technology
PPTX
MULTI-LEARNING SPECIAL SESSION / EDUCON 2018 / EMADRID TEAM
AN AUTOMATED MULTIPLE-CHOICE QUESTION GENERATION USING NATURAL LANGUAGE PROCE...
xAPI design cohort - Team xAPI profilers project briefing and future plan
A Tool to Convert Linked Data of E-Learning System to the SCORM Standard
Orchestration Graphs: Enabling Rich Learning Scenarios at Scale
JISC LADIE project Learning Design In Education
A Study on the Processes of OER Integration in Course Development
NLP in Web Data Extraction (Omer Gunes)
Content Wizard: Concept-Based Recommender System for Instructors of Programmi...
World Summit on the Knowledge Society 2009
Natural language processing for requirements engineering: ICSE 2021 Technical...
LODFlow: Workflow Management System for Linked Data Processing
Human-AI Co-Creation of Worked Examples for Programming Classes
Exploring the Content Ecosystem of the First Open-source Adaptive Tutor and i...
Pragmatic software testing education - SIGCSE 2019
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
Class Diagram Extraction from Textual Requirements Using NLP Techniques
D017232729
Preaty tefa 2013
Differentiation with technology
MULTI-LEARNING SPECIAL SESSION / EDUCON 2018 / EMADRID TEAM

Recently uploaded (20)

PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Lesson notes of climatology university.
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Classroom Observation Tools for Teachers
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
Cell Structure & Organelles in detailed.
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Presentation on HIE in infants and its manifestations
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Supply Chain Operations Speaking Notes -ICLT Program
VCE English Exam - Section C Student Revision Booklet
Lesson notes of climatology university.
Microbial diseases, their pathogenesis and prophylaxis
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
102 student loan defaulters named and shamed – Is someone you know on the list?
human mycosis Human fungal infections are called human mycosis..pptx
Classroom Observation Tools for Teachers
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Module 4: Burden of Disease Tutorial Slides S2 2025
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Cell Structure & Organelles in detailed.
Chinmaya Tiranga quiz Grand Finale.pdf
GDM (1) (1).pptx small presentation for students
Presentation on HIE in infants and its manifestations
A systematic review of self-coping strategies used by university students to ...
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...

A Combined Method for E-Learning Ontology Population based on NLP and User Activity Analysis

  • 1. A Combined Method for E-Learning Ontology Population based on NLP and User Activity Analysis Dmitry Mouromtsev, Fedor Kozlov, Liubov Kovriguina and Olga Parkhimovich Laboratory ISST @ ITMO University, St. Petersburg, Russia Linked Learning meets LinkedUp: Learning and Education with the Web of Data @ ISWC 2014, Italy
  • 2. Introduction ● Use semantics to make education materials reusable and flexible, ● Different datasets in one e-learning system, ● We need to provide tools for tutors to improve their courses.
  • 3. Use students to improve your course!
  • 4. Goal ● To develop the ontologies, ● To convert tests from XML to RDF, ● To map tests with subject terms via NLP algorithms, ● To gather user’s statistics and implement analysis module.
  • 5. ECOLE: Front-end ● Ontology-based e-learning system, ● User friendly interface, ● Based on Django framework.
  • 6. ECOLE: Back-end ● Collection of educational materials from different open resources (Dbpedia, BNB), ● Analytics tools, ● Based on the Information Workbench platform.
  • 7. The ontology of education resources ● Extends AIISO ontology for education process and structure, ● Uses BIBO for bibliographic resources, ● Uses MA-ONT for media resources.
  • 8. The ontology of tests ● Developed to describe the content of tests, ● Extends top-level ontology of the system, ● 12 classes, ● 10 object properties, ● 6 datatype properties, ● http://purl.org/ailab/testontology.
  • 9. The ontology of student activity ● Developed to store information about the student's learning process and results, ● Uses FOAF, ● Uses the ontology of test, ● 10 classes, ● 15 object properties, ● 5 datatype properties, ● http://purl.org/ailab/learningresults .
  • 10. Lemmatization ● Extracts lemmas from subject term labels, ● Uses NLP procedures and dictionaries to generate lemmas, ● Stores lemmas in triples.
  • 12. New System Terms ● If the extracted word sequence doesn't match any of the system terms, it is included as a new system term, ● New system terms are validated via SPARQL queries to Dbpedia.
  • 13. NLP Algorithm Dictionary Patterns Set Tokenization Morphological and Semantic Tagging Extracting Candidate Terms Producing the Canonical Form of the Term + Lemma(s) ● Uses Part-of-Speech patterns ● Grammatical agreement is specified in the patterns ● Words+Lemmas+Morphology+Some semantic tags ● Superlemmas (a common lemma is generated for lexical variants) ● Derivation paradigms (a common lemma is generated for words with the same root)
  • 14. Implementation: Test parsers ● Converts tests of the course from XML to RDF, ● Uses Information Workbench XMLProvider to automatically update, ● Describes mapping using XPath functions.
  • 15. Implementation: NLP module ● Uses dictionaries in NooJ format: <LEMMA>+<PART OF SPEECH TAG>+ <INFLECTIONAL PARADIGM>+<OTHER ANNOTATIONS> air,N+FLX=TABLE Michelson,N+ProperName ● English NooJ resources are reused, Russian lexical resources are original, ● A separate procedure implemented in Python is launched for lemmatization and extraction of the terms.
  • 17. Hidden subject terms A ladder is 5m long. How far from the base of a wall should it be placed if it is to reach 4m up the wall? Nothing to extract!!! Hidden subject terms: Pythagorean Theorem, Hypotenuse, Сathetus…...
  • 18. User statistics collection ● Collects user's activity on front-end, ● Creates triples, ● Analyzes user's answers on tests, ● Builds rating of the terms, which caused difficulties for students.
  • 19. Conclusion ● The ontologies of tests and student activity have been developed, ● The tasks of the test have been linked with system terms, ● The statistics gathering module has been developed, ● The system provides rating of the terms, which caused difficulties for students.
  • 20. Future Work ● Improve term extraction procedure by adding parallel texts of tasks, ● Process units of measure in tasks to predict “hidden terms”, ● Use relations between subject terms to improve the quality of term extraction procedure, ● Refine term knowledge rating by replacing it by the proper ranking formula.
  • 21. Thank you!!! The front-end of the e-learning system: http://ecole.ifmo.ru Example of subject terms analytics for module "Interference and Coherence": http://openedu.ifmo.ru:8888/resource/Phisics:m_InterferenceAnd Coherence?analytic=1 The source code: https://github.com/ailabitmo/linked-learning-solution

Editor's Notes

  • #5: To develop the ontology of tests, To develop the ontology of student activity, To convert tests from XML to RDF, To extract terms from tasks using morpho-syntactic patterns, To link tasks to the lecture terms via extracted candidate terms, To validate candidate terms via DBpedia, To develop statistics gathering module, To create pages to show statistics of students' answers to the tests.
  • #8: ??? example of indirect links ???
  • #9: The main purpose of the developed ontology is to represent structural units of a test and provide automatic task matching by defining semantic relations between tasks and terms
  • #10: Two top-level ontologies have been used for its development: ontology of test, as described above, and FOAF ontology that describes people and relationships between them. The e-learning system uses the ontology of tests and answers given by the user to build a list of terms that the user knows.
  • #12: Collects tasks of the course using SPARQL queries, Forms the plain text content for each task using the information about questions and answers of the task, Launches NLP procedures for the content of the task, The result data contain a canonical form and lemma(s) for the candidate term, Search terms in the system to link them with candidate terms, System terms and candidate terms are linked, if they have the same lemma(s).
  • #14: NLP algorithm has plain text of the task in the input. This plain text is tokenized and tokens are tagged basing on lexical resources that are stored in the dictionary. Candidate terms are extracted with part-of speech patterns. About a dozen patterns is used for term extraction. Russian patterns have more constraints, because it is necessary to specify grammatical agreement in the pattern. For each candidate term a canonical form is produced to validate this term in the DBpedia and lemma(s) to map the term to system terms. //////The canonical form corresponds to the form of the headword of this term in dictionaries. /////////// Uses Part-of-Speech patterns to extract terminology For the Russian language grammatical agreement is specified in the patterns, Example: <N+nom+sg><N+gen+sg+Prop> to extract Russian terms like "бипризма Френеля" ("Fresnel biprism"), Superlemmas allow to link all variants via a canonical form and store them in one equivalence class (e.g, “rectangular Cartesian coordinate system” and “RCCS” are matched via the same query), Derivational paradigms are described, which allows to produce a common lemma for words belonging e.g. to different parts of speech (that is, "coplanar" and "coplanarity" will be lemmatized as "coplanar".
  • #16: We use lexical resources built in/generated in NooJ linguistic environment. Dictionaries are written in NooJ format *dic: <LEMMA>+<PART OF SPEECH TAG>+<INFLECTIONAL PARADIGM>+<OTHER ANNOTATIONS> (red - necessary, yellow - optional). Example: air,N+FLX=TABLE, Michelson,N+ProperName (red & orange - necessary elements for the dictionary to be valid, green & yellow - optional). When we have specified the inflectional paradigm, it is possible to generate all word forms with corresponding lemmas and morphological and semantic tagging. //aperture,aperture,N+FLX=TABLE+s //apertures,aperture,N+FLX=TABLE+p For the demo version in English standard NooJ lexical resources were reused, Russian lexical resources are original. After tagging a Python script runs to lemmatize input tokens and extract terms.
  • #17: Percent of linked tasks - a good result; it shows that 95% of tasks are suitable to test the student’s knowledge, the test set is generally valid. Number of different candidate terms - precision needs improvement! There are 7.5 candidate terms per test in the average, that is, certainly, isn’t true (content of the physics test module has 1-2 terms per task (NB! they may repeat in the task). This indexes to the number of false candidates equal to the number of true candidates and shows the necessity of the validation procedure. Number of manually extracted different terms - a “gold standard”. Thus, we have a 1-2 terms circulating in the task. It seems, that the amount of terms in the tasks is domain-dependent. Percent of system terms, matched by candidate terms - needs improvement. Improvements can be done by adding periphrases of POS-patterns. Tasks testing some system terms have only “hidden terms”.
  • #19: The data about number of correct/incorrect user's answers allow to compute the knowledge rating for any of the system terms. Using this rating, teachers can determine which terms of the course students know worst of all.
  • #20: The developed modules for the e-learning system provide teachers with a tool to maintain relevance and high quality of existing knowledge assessment modules. The system provides rating of the terms, which caused difficulties for students. Based on this rating, teachers can change theoretical material of the course by improving description of certain terms and add proper tasks.