SlideShare a Scribd company logo
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 6, Ver. IV (Nov – Dec. 2015), PP 95-101
www.iosrjournals.org
DOI: 10.9790/0661-176495101 www.iosrjournals.org 95 | Page
A Study on Learning Factor Analysis – An Educational Data
Mining Technique for Student Knowledge Modeling
S. Lakshmi Prabha1
, Dr.A.R.Mohamed Shanavas2
1
Ph.D Research Scholar, Bharathidasan University & Associate professor, Department of Computer Science,
Seethalakshmi Ramaswami College, Tiruchirappalli, Tamilnadu, India,
2
Associate professor,Department of Computer Science, Jamal Mohamed College, Tiruchirappalli, Tamilnadu,
India,
Abstract: The increase in dissemination of interactive e-learning environments has allowed the collection of
large repositories of data. The new emerging field, Educational Data Mining (EDM) concerns with developing
methods to discover knowledge from data collected from e-learning and educational environments. EDM can be
applied in modeling user knowledge, user behavior and user experience in e-learning platforms. This paper
explains how Learning Factor Analysis (LFA), a data mining method is used for evaluating cognitive model and
analyzing student-tutor log data for knowledge modeling. Also illustrates how learning curves can be used for
visualizing the performance of the students.
Keywords: e-learning, Educational Data Mining (EDM), Learning Factor Analysis (LFA)
I. Introduction
Educational Data Mining is an inter-disciplinary field utilizes methods from machine learning,
cognitive science, data mining, statistics, and psychometrics. The main aim of EDM is to construct
computational models and tools to discover knowledge by mining data taken from educational settings. The
increase of e-learning resources such as interactive learning environments, learning management systems
(LMS), intelligent tutoring systems (ITS), and hypermedia systems, as well as the establishment of school
databases of student test scores, has created large repositories of data that can be explored by EDM researchers
to understand how students learn and find out models to improve their performance.
Baker [1] has classified the methods in EDM as: prediction, clustering, relationship mining, distillation
of data for human judgment and discovery with models. These methods are used by the researchers [1][2] to
find solutions for the following goals:
1. Predicting students‟ future learning behavior by creating student models that incorporate detailed information
about students‟ knowledge, meta-cognition, motivation, and attitudes.
2. Discovering or improving domain models that characterize the content to be learned and optimal instructional
sequences.
3. Studying the effects of different kinds of pedagogical support that can be provided by learning software, and
4. Advancing scientific knowledge about learning and learners through building computational models that
incorporate models of the student, the software‟s pedagogy and the domain.
The application areas [3] of EDM are: 1) User modeling 2) User grouping or Profiling 3) Domain
modeling and 4) trend analysis. These application areas utilize EDM methods to find solutions. User modeling
[3] encompasses what a learner knows, what the user experience is like, what a learner‟s behavior and
motivation are, and how satisfied users are with online learning. User models are used to customize and adapt
the system behaviors‟ to users specific needs so that the systems „say‟ the „right‟ thing at the „right‟ time in the
„right „way [4]. This paper concerns with applying EDM method Learning factor Analysis (LFA) for User
knowledge Modeling. This paper is organized as follows: section 2 lists the related works done in this research
area; section 3 explains LFA method used in this research; section 4 describes methodology used, section 5
discusses the results and section 6 concludes the work.
II. Literature Review
A number of studies have been conducted in EDM to find the effect of using the discovered methods
on student modeling. This section provides an overview of related works done by other EDM researchers.
Newell and Rosenbloom[5] found a power relationship between the error rate of performance and the
amount of practice .Corbett and Anderson [6] discovered a popular method for estimating students‟ knowledge
is knowledge tracing model, an approach that uses a Bayesian-network-based model for estimating the
probability that a student knows a skill based on observations of him or her attempting to perform the skill.
Baker et.al [7] have proposed a new way to contextually estimate the probability that a student obtained a
correct answer by guessing, or an incorrect answer by slipping, within Bayesian Knowledge Tracing. Koedinger
A Study on Learning Factor Analysis – An Educational Data Mining Technique for Student…
DOI: 10.9790/0661-176495101 www.iosrjournals.org 96 | Page
et. al [8]demonstrated that a tutor unit, redesigned based on data-driven cognitive model improvements, helped
students reach mastery more efficiently. It produced better learning on the problem-decomposition planning
skills that were the focus of the cognitive model improvements. Stamper and Koedinger [9], presented a data-
driven method for researchers to use data from educational technologies to identify and validate improvements
in a cognitive model which used Knowledge or skill components equivalent to latent variables in a logistic
regression model called the Additive Factors Model (AFM). Brent et. al [10] used learning curves to analyze a
large volume of user data to explore the feasibility of using them as a reliable method for fine tuning adaptive
educational system. Feng et. al[11], addressed the assessment challenge in the ASSISTment system, which is a
web-based tutoring system that serves as an e-learning and e-assessment environment. They presented that the
on line assessment system did a better job of predicting student knowledge by considering how much tutoring
assistance was needed, how fast a student solves a problem and how many attempts were needed to finish a
problem. Saranya et. al [12] proposed system regards the student‟s holistic performance by mining student data
and Institutional data. Naive Bayes classification algorithm is used for classifying students into three classes –
Elite, Average and Poor. Koedinger, K.R.,[13] Professor, Human Computer Interaction Institute, Carnegie
Mellon University, Pittsburgh has done lot to this EDM research. He developed cognitive models and used
students interaction log taken from the Cognitive Tutors, analyzed for the betterment of student learning process
Better assessment models always result with quality education.
Assessing student‟s ability and performance with EDM methods in e-learning environment for math
education in school level in India has not been identified in our literature review. Our method is a novel
approach in providing quality math education with assessments indicating the knowledge level of a student in
each lesson.
III. Learning Factor Analysis
User modeling or student modeling identifies what a learner knows, what the learner experience is like,
what a learner‟s behavior and motivation are, and how satisfied users are with e-learning. Item Response
Theory and Rash model [20] is Psychometric Methods to measure students‟ ability. They lack in providing
results that are easy to interpret by the users. This paper deals with identifying learners‟ knowledge level
(knowledge modeling) using LFA in an e-learning environment.
LFA is an EDM method for evaluating cognitive models and analysing student-tutor log data. LFA
uses three components: 1) Statistical model – multiple logistic regression model is used to quantify the skills.
2) Human expertise- difficulty factors (concepts or KCs) defined by the subject experts (teachers): a
set of factors that make a problem-solving step more difficult for a student and
3) A* search – a combinatorial search for model selection.
A good cognitive model for a tutor uses a set of production rules or skills which specify how students
solve problems. The tutor should estimate the skills learnt by each student when they practice with the tutor. The
power law [5] defines the relationship between the error rate of performance and the amount of practice,
depicted by equation (1).This shows that the error rate decreases according to a power function as the amount of
practice increase.
Y= aXb .....
(1)
Where
Y = the error rate
X = the number of opportunities to practice a skill
a = the error rate on the first trial, reflecting the intrinsic difficulty of a skill
b = the learning rate, reflecting how easy a skill is to learn
While the power law model applies to individual skills, it does not include student effects. In order to
accommodate student effects for a cognitive model that has multiple rules, and that contains multiple students,
the power law model is extended to a multiple logistic regression model (equation 2)[24].
ln[Pijt/(1-Pijt)]= Σ αi Xi + Σ βjYj + Σ γjYjTjt …….(2)
Where Pijt is the probability of getting a step in a tutoring question right by the ith student‟s t th
opportunity to practice the jth KC; X = the covariates for students; Y = the covariates for skills(knowledge
components); T = the number of practice opportunities student i has had on knowledge component j; α = the
coefficient for each student, that is, the student intercept; β = the coefficient for each knowledge component, that
is, the knowledge component intercept; γ = the coefficient for the interaction between a knowledge component
and its opportunities, that is, the learning curve slope. The model says that the log odds of Pijt is proportional to
the overall “smarts” of that student (αi) plus the “easiness” of that KC (βj) plus the amount gained (γj) for each
practice opportunity. This model can show the learning growth of students at any current or past moment.
A difficulty factor refers specifically to a property of the problem that causes student difficulties. The
tutor considered for this research has metric measures as lesson 1 which requires 5 skills (conversion, division,
A Study on Learning Factor Analysis – An Educational Data Mining Technique for Student…
DOI: 10.9790/0661-176495101 www.iosrjournals.org 97 | Page
multiplication, addition, and result). These are the factors (KCs) in this tutor (Table 1) to be learnt by the
students in solving the steps. Each step has a KC assigned to it for this study.
Table 1. Factors for the Metric measures and their values
Factor Names Factor Values
Converion Correct formula, Incorrect
Addition Correct, Wrong
Multiplication Correct, Wrong
Division Correct, Wrong
Result Correct, Wrong
The combinatorial search will select a model within the logistic regression model space. Difficulty
factors are incorporated into an existing cognitive model through a model operator called Binary Split, which
splits a skill a skill with a factor value, and a skill without the factor value. For example, splitting production
Measurement by factor conversion leads to two productions: Measurement with the factor value Correct formula
and Measurement with the factor value Incorrect. A* search is the combinatorial search algorithm [25] in LFA.
It starts from an initial node, iteratively creates new adjoining nodes, explores them to reach a goal node. To
limit the search space, it employs a heuristic to rank each node and visits the nodes in order of this heuristic
estimate. In this study, the initial node is the existing cognitive model. Its adjoining nodes are the new models
created by splitting the model on the difficulty factors. We do not specify a model to be the goal state because
the structure of the best model is unknown. For this paper 25 node expansions per search is defined as the
stopping criterion. AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) are two
estimators used as heuristics in the search.
AIC = -2*log-likelihood + 2*number of parameters. .... (3)
BIC = -2*log-likelihood + number of parameters * number of observations. ..... (4)
Where log-likelihood measures the fit, and the number of parameters, which is the number of
covariates in equation 2, measures the complexity. Lower AIC & BIC scores, mean a better balance between
model fit and complexity.
IV. Methodology
In this paper the LFA methodology is illustrated using data obtained from the Metric measures lesson
of Mensuration Tutor MathsTutor[18] . Our dataset consist of 2,247 transactions involving 60 students, 32
unique steps and 5 Skills (KCs) in students exercise log. All the students were solving 9 problems 5 in mental
problem category, 3 in simple and one in big. Total steps involved are 32. While solving exercise problem a
student can ask for a hint in solving a step. Each data point is a correct or incorrect student action corresponding
to a single skill execution. Student actions are coded as correct or incorrect and categorized in terms of
“knowledge components” (KCs) needed to perform that action. Each step the student performs is related to a
KC and is recorded as an “opportunity” for the student to show mastery of that KC. This lesson has 5 skills
(conversion, division, multiplication, addition, and result) correspond to the skill needed in a step. Each step has
a KC assigned to it for this study. The table 2 shows a sample data with columns: Student- name of the student;
Step – problem 1 Step1; Success – Whether the student did that step correctly or not in the first attempt. 1-
success and 0-failure; Skill – Knowledge component used in that step; Opportunities – Number of times the skill
is used by the same student computed from the first and fourth column.
Table 2. The sample data
Student Step Success Skill Opportunities
X P1s1 1 conversion 1
X P1s2 1 result 1
X P2s1 0 conversion 2
To find fitness of the model logistic regression values are calculated with Additive Factor Model
(AFM)[26]. The values are present in Table 3.Number of parameters and number of observations in equation 3
and 4 is 60 (students) and 1920 (32unique steps x 60 students) respectively. Lower values of AIC, BIC and Root
Mean Squared Error (RMSE) indicate a better fit between the model's predictions and the observed data. Two
types of cross validation are run for each KC model in the dataset. These types are a 3-fold cross validation of
the Additive Factor Model's (AFM)[25] error rate predictions. In student stratified, data points are grouped by
student, the full set of students is divided into 3 groups. 3-fold cross validation is then performed across these 3
groups. In Item stratified, data points are grouped by step, the full set of steps is divided into 3 groups. 3-fold
cross validation is then performed across these 3 groups. The Slope parameter represents how quickly students
will learn the knowledge component. The larger the KC slope, the faster students learn the knowledge
A Study on Learning Factor Analysis – An Educational Data Mining Technique for Student…
DOI: 10.9790/0661-176495101 www.iosrjournals.org 98 | Page
component. The conversion KC has 0 slope representing no learning takes place to be attended by the teacher.
The addition KC has higher value indicating that students find it easier to solve. This table shows that this model
best fitted the current tutor dataset with lower AIC, BIC, and RMSE values for the KC models used.
Table 3. Logistic Regression Model values
KC Model AIC BIC Log
likelihood
RMSE
(student
stratified)
RMSE
(item
stratified)
Slope
Addition 1,189.43 1,545.18 -530.72 0.302511 0.288114 0.732
Conversion 1,155.22 1,511.02 -513.61 0.298859 0.284691 0.000
Division 1,190.19 1546.03 -513.09 0.301930 0.289071 0.623
Multiplication 1,193.94 1,549.76 -532.97 0.301943 0.287855 0.112
Result 1,197.65 1,553.49 -534.82 0.301916 0.287417 0.075
Learning curves [10] have become a standard tool for measurement of students‟ learning in intelligent
tutoring systems. Here in our study we used learning curve to visualize the student performance over
opportunities. Slope and fit of learning curves show the rate at which a student learns over time, and reveal how
well the system model fits what the student is learning. We used learning curves to measure the performance of
tutoring system domain or student models. Measures of student performance are described below in table 3.
Regardless of metric, each point on the graph is an average across all selected knowledge components and
students.
Table 3. Measures of student performance
Measure Description
Assistance
Score
The number of incorrect attempts plus hint requests for a given opportunity
Error Rate The percentage of students that asked for a hint or were incorrect on their first attempt. For example, an
error rate of 45% means that 45% of students asked for a hint or performed an incorrect action on their
first attempt. Error rate differs from assistance score in that it provides data based only on the first attempt.
As such, an error rate provides no distinction between a student that made multiple incorrect attempts and
a student that made only one.
Number of
Incorrect
The number of incorrect attempts for each opportunity
Number of
Hints
The number of hints requested for each opportunity
Step Duration The elapsed time of a step in seconds, calculated by adding all of the durations for transactions that
were attributed to the step.
Correct Step
Duration
The step duration if the first attempt for the step was correct. The duration of time for which students
are "silent", with respect to their interaction with the tutor, before they complete the step correctly. This is
often called "reaction time" (on correct trials) in the psychology literature. If the first attempt is an error
(incorrect attempt or hint request), the observation is dropped.
Error Step
Duration
The step duration if the first attempt for the step was an error (hint request or incorrect attempt). If the
first attempt is a correct attempt, the observation is dropped.
Learning curve is categorised as follows:
 low and flat:. The low error rate shows that students mastered the KCs but continued to receive
tasks for them
 no learning: the slope of the predicted learning curve shows no apparent learning for these KCs.
 still high: students continued to have difficulty with these KCs. Consider increasing opportunities
for practice.
 too little data: students didn't practice these KCs enough for the data to be interpretable.
 good: these KCs did not fall into any of the above "bad" or "at risk" categories. Thus, these are
"good" learning curves in the sense that they appear to indicate substantial student learning.
The above categorisations assist the teacher in knowing about the students‟ knowledge level in specific
concepts to be mastered by the students
V. Results And Discussions
To analyse the performance of student(s), we used Datashop[13] analysis and visualization tool for
generating learning curves by uploading our dataset. The fig. 1 shows the problem steps involved in the first
problem and number of correct/incorrect attempts done by 60 students.
A Study on Learning Factor Analysis – An Educational Data Mining Technique for Student…
DOI: 10.9790/0661-176495101 www.iosrjournals.org 99 | Page
Fig. 1, Problem steps and Attempts made in problem1
The following chart (Fig. 2) shows that the KC-conversion had maximum error rate compared with
other KCs. This explains that the students struggled in conversion step (converting from one unit to other unit in
metric measures lesson).
Fig. 2. Error rate Vs KCs Fig. 3. Average number of hints Vs KCs
From Fig. 3 it is identified that average number of hints requested by the students for conversion KC is
greater than other KCs. The difficulty level of Conversion KC is greater than other KCs. It indicates that
conversion KC has to be explained by the teacher in the class or more practice has to be given to the students.
The Fig. 4 shows the assistance score made the students in all the 9 problems they solved. Though the
fourth problem is defined in mental problem category requires 2 or 3 steps to find the solution, the students
made maximum number of incorrect attempts and requested for hints. This indicates that the problem is tough
for the learners and they did not understand the concept. Students took more time for solving the conversion KC
than other KCs (Fig. 5). This indicates the difficulty level of that skill.
Fig. 4. Assistance Score Vs Problems Fig. 5. Step Duration Vs KCs
The empirical learning curve give a visual clue as to how well a student may do over a set of learning
opportunities, the predicted curves allow for a more precise prediction of a success rate at any learning
opportunity. The predicted learning curve is much smoother. It is computed using the Additive Factor Model
(AFM)[25], which uses a set of customized Item-Response models to predict how a student will perform for
each skill on each learning opportunity. The predicted learning curves are the average predicted error of a skill
over each of the learning opportunities. The blue line in learning curves shows the predicted value and
category is defined using the predicted value. The learning curve has some blips depending on error rate but the
predicted line is very smooth.
A Study on Learning Factor Analysis – An Educational Data Mining Technique for Student…
DOI: 10.9790/0661-176495101 www.iosrjournals.org 100 | Page
Fig. 6. Learning Curve for Conversion KC Fig. 7. Learning Curve for Multiplication KC
Fig. 8. Learning Curve for Division Fig. 9. Learning Curve for Result KC
Fig. 10. Learning Curve for Addition KC Fig. 11. Learning Curve for Single-KC
From the predicted learning curve for conversion KC (Fig. 6) we can infer that „no learning‟ took place
while practicing. There were 11 opportunities for conversion and 4th
conversion has maximum error rate 33.3%.
We understood that no conversion was at 0% error rate. The teacher can better guide the students in that area.
He can do changes in domain modeling by adding new problems in examples and providing more exercises.
Learning curves shown in Fig. 7 and 9 are in the category „Low and Flat‟ explains that students likely received
too much practice for these KCs. This shows that the students were mastered in these skills and do not require
any more practice. Fig.8 and 11 are in the category „good‟ indicate that the students got sufficient learning in
that. Single-KC model in Fig. 11 shows the overall performance of the students in all the 32 unique steps are
good. In 32 steps only 2 steps used addition so fig. 10 shows „too little data‟. We can add problems for this KC
or it can be merged with other KCs.
VI. Conclusion
Student knowledge models can be improved by mining students‟ interaction data. This paper analyzed
the use of LFA in student knowledge modeling in maths education with learning curves by mining the students
log data. This method assists the teacher in: 1) measuring the difficulty and learning rates of Knowledge
Components (KCs). 2) predict student performance in practicing each KC. 3) identify over-practiced or under-
practiced KCs. The learners can understand what they know and do not know. The students with poor
performance can be given with more problems for practicing. This method provides more insight into the
performance of skills in every step for each student. The next step of this research is to provide a personalized
tutoring environment for the students by incorporating the results into the tutor and providing automated
suggestion to improve their performance. Clustering algorithms can be used to suggest the teacher in grouping
the students according to their performance
References
[1] Baker, R. S. J. d., ( 2011), “Data Mining for Education.” In International Encyclopedia of Education, 3rd
ed., Edited by B. McGaw,
P. Peterson, and E. Baker. Oxford, UK: Elsevier.
[2] Baker, R. S. J. D., and K. Yacef, ( 2009), “The State of Educational Data Mining in 2009: A Review and Future Visions.” Journal
of Educational Data Mining 1 (1): 3–17.
A Study on Learning Factor Analysis – An Educational Data Mining Technique for Student…
DOI: 10.9790/0661-176495101 www.iosrjournals.org 101 | Page
[3] S. Lakshmi Prabha, Dr.A.R.Mohamed Shanavas, (2014), EDUCATIONAL DATA MINING APPLICATIONS, Operations
Research and Applications: An International Journal (ORAJ), Vol. 1, No. 1, August 2014, 23-29.
[4] Feng, M., N. T. Heffernan, and K. R. Koedinger, (2009), “User Modeling and User-Adapted Interaction: Addressing the
Assessment Challenge in an Online System That Tutors as It Assesses.” The Journal of Personalization Research (UMUAI journal)
19 (3): 243–266.
[5] Newell, A., Rosenbloom, P.,(1981), Mechanisms of Skill Acquisition and the Law of Practice. In Anderson J. (ed.): Cognitive
Skills and Their Acquisition, Erlbaum Hillsdale NJ (1981)
[6] Corbett, A. T., and J. R. Anderson, (1994), “Knowledge Tracing: Modeling the Acquisition of Procedural Knowledge.” User
Modeling and User-Adapted Interaction 4 (4): 253–278. doi: 10.1007/BFO1099821
[7] Baker, R.S.J.d., Corbett, A.T., Aleven, V., (2008), More Accurate Student Modeling Through Contextual Estimation of Slip and
Guess Probabilities in Bayesian Knowledge Tracing. Proceedings of the 9th International Conference on Intelligent Tutoring
Systems, 406-415.
[8] Koedinger, K.R., Stamper, J.C., McLaughlin, E.A., & Nixon, T., (2013), Using data-driven discovery of better student models to
improve student learning. In Yacef, K., Lane, H., Mostow, J., & Pavlik, P. (Eds.) In Proceedings of the 16th International
Conference on Artificial Intelligence in Education, pp. 421-430.
[9] Stamper, J.C., Koedinger, K.R.,(2011), Human-machine student model discovery and improvement using DataShop. In: Biswas, G.,
Bull, S., Kay, J., Mitrovic, A. (eds.) AIED 2011. LNCS, vol. 6738, pp. 353–360. Springer, Heidelberg (2011).
[10] Brent Martin , Antonija Mitrovic , Kenneth R Koedinger , Santosh Mathan, (2011), Evaluating and Improving Adaptive
Educational Systems with Learning Curves, User Modeling and User-Adapted Interaction , 2011; 21(3):249-283.
DOI: 10.1007/s11257-010-9084-2.
[11] Feng, M., Heffernan, N.T., & Koedinger, K.R., (2009), Addressing the assessment challenge in an Online System that tutors as it
assesses. User Modeling and User-Adapted Interaction: The Journal of Personalization Research (UMUAI journal). 19(3), 243-266,
August, 2009.
[12] S.Saranya, R.Ayyappan , N.Kumar, (2014), Student Progress Analysis and Educational Institutional Growth Prognosis Using Data
Mining, International Journal Of Engineering Sciences & Research Technology, 3(4): April, 2014, 1982-1987.
[13] Koedinger, K.R., Baker, R.S.J.d., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J., (2010), A Data Repository for the EDM
community: The PSLC DataShop. In Romero, C., Ventura, S., Pechenizkiy, M., Baker, R.S.J.d. (Eds.) Handbook of Educational
Data Mining. Boca Raton, FL: CRC Press.
[14] Surjeet Kumar Yadav, Saurabh pal, (2012), Data Mining Application in Enrollment Management: A Case Study, International
Journal of Computer Applications (0975 – 8887) Volume 41– No.5, March 2012, pg:1-6.
[15] Wilson, M., de Boeck, P.,(2004), Descriptive and explanatory item response models. In: de Boeck, P., Wilson, M. (eds.)
Explanatory Item Response Models, pp. 43–74. Springer (2004)
[16] Pooja Gulati, Dr. Archana Sharma, (2012), Educational Data Mining for Improving Educational Quality, IRACST - International
Journal of Computer Science and Information Technology & Security (IJCSITS), ISSN: 2249-9555 Vol. 2, No.3, June 2012,
pg.648-650.
[17] Pooja Thakar, Anil Mehta, Manisha, (2015), Performance Analysis and Prediction in Educational Data Mining: A Research
Travelogue, International Journal of Computer Applications (0975 – 8887) Volume 110 – No. 15, January 2015, pg:60-68.
[18] Prabha, S.Lakshmi; Shanavas, A.R.Mohamed, (2014), "Implementation of E-Learning Package for Mensuration-A Branch of
Mathematics," Computing and Communication Technologies (WCCCT), 2014 World Congress on , vol., no., pp.219,221, Feb. 27
2014-March 1 2014,doi:10.1109/WCCCT.2014.37
[19] Brett Van De Sande, (2013), Properties of the Bayesian Knowledge Tracing Model, Journal of Educational Data Mining, Volume 5,
No 2, August, 2013,1-10.
[20] Wu, M. & Adams, R., (2007), Applying the Rasch model to psycho-social measurement: A practical approach. Educational
Measurement Solutions, Melbourne.
[21] Romero, C.,&Ventura,S.,(2010), Educational data mining: A review of the state of the art,IEEE Transactions on systems man and
Cybernetics Part C.Applications and review, 40(6),601-618.
[22] Wasserman L.,(2004), All of Statistics, 1st edition, Springer-Verlag New York, LLC
[23] Cen, H., Koedinger, K. & Junker, B., (2005), Automating Cognitive Model Improvement by A* Search and Logistic Regression. In
Proceedings of AAAI 2005 Educational Data Mining Workshop.
[24] Russell S., Norvig P.,(2003), Artificial Intelligence, 2nd edn. Prentice Hall (2003).
[25] Cen, H., Koedinger, K., Junker, B., (2007), Is Over Practice Necessary? Improving Learning Efficiency with the Cognitive Tutor
through Education. The 13th International Conference on Artificial Intelligence in Education (AIED 2007). 2007.
[26] S. Lakshmi Prabha et al, (2015), Performance of Classification Algorithms on Students‟ Data – A Comparative Study, International
Journal of Computer Science and Mobile Applications, Vol.3 Issue. 9, pg. 1-8.
[27] S. Lakshmi Prabha, A.R. Mohamed Shanavas,(2015), Analysing Students Performance Using Educational Data Mining Methods,
International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 10 No.82, pg. 667-671.

More Related Content

PDF
[IJET-V2I1P2] Authors: S. Lakshmi Prabha1, A.R.Mohamed Shanavas
PDF
Clustering Students of Computer in Terms of Level of Programming
PDF
A Survey on Research work in Educational Data Mining
PDF
Correlation based feature selection (cfs) technique to predict student perfro...
PDF
Data Mining Application in Advertisement Management of Higher Educational Ins...
PDF
PREDICTING ACADEMIC MAJOR OF STUDENTS USING BAYESIAN NETWORKS TO THE CASE OF ...
PDF
Recognition of Slow Learners Using Classification Data Mining Techniques
[IJET-V2I1P2] Authors: S. Lakshmi Prabha1, A.R.Mohamed Shanavas
Clustering Students of Computer in Terms of Level of Programming
A Survey on Research work in Educational Data Mining
Correlation based feature selection (cfs) technique to predict student perfro...
Data Mining Application in Advertisement Management of Higher Educational Ins...
PREDICTING ACADEMIC MAJOR OF STUDENTS USING BAYESIAN NETWORKS TO THE CASE OF ...
Recognition of Slow Learners Using Classification Data Mining Techniques

What's hot (14)

PDF
Analyzing undergraduate students’ performance in various perspectives using d...
PDF
Student Performance Evaluation in Education Sector Using Prediction and Clust...
PDF
Association rule discovery for student performance prediction using metaheuri...
PDF
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...
PDF
IRJET- Academic Performance Analysis System
PDF
Predicting students' performance using id3 and c4.5 classification algorithms
PDF
Literature Survey on Educational Dropout Prediction
PDF
03 20250 classifiers ensemble
PDF
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
PDF
Predicting students performance using classification techniques in data mining
PDF
F03403031040
PDF
Evaluation of Data Mining Techniques for Predicting Student’s Performance
PDF
L016136369
PDF
IRJET- Using Data Mining to Predict Students Performance
Analyzing undergraduate students’ performance in various perspectives using d...
Student Performance Evaluation in Education Sector Using Prediction and Clust...
Association rule discovery for student performance prediction using metaheuri...
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...
IRJET- Academic Performance Analysis System
Predicting students' performance using id3 and c4.5 classification algorithms
Literature Survey on Educational Dropout Prediction
03 20250 classifiers ensemble
A New Active Learning Technique Using Furthest Nearest Neighbour Criterion fo...
Predicting students performance using classification techniques in data mining
F03403031040
Evaluation of Data Mining Techniques for Predicting Student’s Performance
L016136369
IRJET- Using Data Mining to Predict Students Performance
Ad

Viewers also liked (13)

PDF
Key Educational Data Mining and Learning Analytics Methods.
PPTX
Grand challenges for the Educational Data Mining and Learning Sciences Commun...
PPTX
Some Thoughts on Learning Analytics and Educational Data Mining
PPTX
Educational Data Mining in Program Evaluation: Lessons Learned
PPTX
Data mining to predict academic performance.
PDF
Advances in Learning Analytics and Educational Data Mining
PPTX
Educational Data Mining/Learning Analytics issue brief overview
PDF
Educational Data Mining in relation to education statistics of Nepal
PPTX
Data mining PPT
PPT
Security in mobile ad hoc networks
PPTX
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
PPTX
Application of data mining
PPT
Security in wireless sensor networks
Key Educational Data Mining and Learning Analytics Methods.
Grand challenges for the Educational Data Mining and Learning Sciences Commun...
Some Thoughts on Learning Analytics and Educational Data Mining
Educational Data Mining in Program Evaluation: Lessons Learned
Data mining to predict academic performance.
Advances in Learning Analytics and Educational Data Mining
Educational Data Mining/Learning Analytics issue brief overview
Educational Data Mining in relation to education statistics of Nepal
Data mining PPT
Security in mobile ad hoc networks
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
Application of data mining
Security in wireless sensor networks
Ad

Similar to A Study on Learning Factor Analysis – An Educational Data Mining Technique for Student Knowledge Modeling (20)

PDF
G017224349
DOC
Text
PDF
entropy-22-00012.pdf
PDF
2009 educational data mining 8 43-2-pb
PDF
Student Performance Prediction via Data Mining & Machine Learning
PDF
Data mining approach to predict academic performance of students
PDF
Educational Data Mining & Students Performance Prediction using SVM Techniques
PDF
Discerning Learner's Erudition Using Data Mining Techniques
PDF
A LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENT
PDF
A LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENT
PDF
A Survey on the Classification Techniques In Educational Data Mining
PDF
Data Mining for Education. Ryan S.J.d. Baker, Carnegie Mellon University
PDF
Artificial intelligence to support human instruction Michael C. Mozera,b,c,, ...
PDF
Educational Data Mining to Analyze Students Performance – Concept Plan
PDF
Smartphone, PLC Control, Bluetooth, Android, Arduino.
PPT
micro testing teaching learning analytics
PDF
Multiple educational data mining approaches to discover patterns in universit...
PDF
Identifying the Key Factors of Training Technical School and College Teachers...
PDF
A study model on the impact of various indicators in the performance of stude...
PPTX
Learning to Teach: Improving Instruction with Machine Learning Techniques
G017224349
Text
entropy-22-00012.pdf
2009 educational data mining 8 43-2-pb
Student Performance Prediction via Data Mining & Machine Learning
Data mining approach to predict academic performance of students
Educational Data Mining & Students Performance Prediction using SVM Techniques
Discerning Learner's Erudition Using Data Mining Techniques
A LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENT
A LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENT
A Survey on the Classification Techniques In Educational Data Mining
Data Mining for Education. Ryan S.J.d. Baker, Carnegie Mellon University
Artificial intelligence to support human instruction Michael C. Mozera,b,c,, ...
Educational Data Mining to Analyze Students Performance – Concept Plan
Smartphone, PLC Control, Bluetooth, Android, Arduino.
micro testing teaching learning analytics
Multiple educational data mining approaches to discover patterns in universit...
Identifying the Key Factors of Training Technical School and College Teachers...
A study model on the impact of various indicators in the performance of stude...
Learning to Teach: Improving Instruction with Machine Learning Techniques

More from iosrjce (20)

PDF
An Examination of Effectuation Dimension as Financing Practice of Small and M...
PDF
Does Goods and Services Tax (GST) Leads to Indian Economic Development?
PDF
Childhood Factors that influence success in later life
PDF
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
PDF
Customer’s Acceptance of Internet Banking in Dubai
PDF
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
PDF
Consumer Perspectives on Brand Preference: A Choice Based Model Approach
PDF
Student`S Approach towards Social Network Sites
PDF
Broadcast Management in Nigeria: The systems approach as an imperative
PDF
A Study on Retailer’s Perception on Soya Products with Special Reference to T...
PDF
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
PDF
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
PDF
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
PDF
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
PDF
Media Innovations and its Impact on Brand awareness & Consideration
PDF
Customer experience in supermarkets and hypermarkets – A comparative study
PDF
Social Media and Small Businesses: A Combinational Strategic Approach under t...
PDF
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
PDF
Implementation of Quality Management principles at Zimbabwe Open University (...
PDF
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...
An Examination of Effectuation Dimension as Financing Practice of Small and M...
Does Goods and Services Tax (GST) Leads to Indian Economic Development?
Childhood Factors that influence success in later life
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...
Customer’s Acceptance of Internet Banking in Dubai
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...
Consumer Perspectives on Brand Preference: A Choice Based Model Approach
Student`S Approach towards Social Network Sites
Broadcast Management in Nigeria: The systems approach as an imperative
A Study on Retailer’s Perception on Soya Products with Special Reference to T...
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladesh
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...
Media Innovations and its Impact on Brand awareness & Consideration
Customer experience in supermarkets and hypermarkets – A comparative study
Social Media and Small Businesses: A Combinational Strategic Approach under t...
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...
Implementation of Quality Management principles at Zimbabwe Open University (...
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...

Recently uploaded (20)

PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Lecture Notes Electrical Wiring System Components
DOCX
573137875-Attendance-Management-System-original
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
web development for engineering and engineering
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Welding lecture in detail for understanding
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
CYBER-CRIMES AND SECURITY A guide to understanding
Embodied AI: Ushering in the Next Era of Intelligent Systems
Lecture Notes Electrical Wiring System Components
573137875-Attendance-Management-System-original
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
OOP with Java - Java Introduction (Basics)
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
UNIT 4 Total Quality Management .pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
CH1 Production IntroductoryConcepts.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
web development for engineering and engineering
Automation-in-Manufacturing-Chapter-Introduction.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Welding lecture in detail for understanding

A Study on Learning Factor Analysis – An Educational Data Mining Technique for Student Knowledge Modeling

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 6, Ver. IV (Nov – Dec. 2015), PP 95-101 www.iosrjournals.org DOI: 10.9790/0661-176495101 www.iosrjournals.org 95 | Page A Study on Learning Factor Analysis – An Educational Data Mining Technique for Student Knowledge Modeling S. Lakshmi Prabha1 , Dr.A.R.Mohamed Shanavas2 1 Ph.D Research Scholar, Bharathidasan University & Associate professor, Department of Computer Science, Seethalakshmi Ramaswami College, Tiruchirappalli, Tamilnadu, India, 2 Associate professor,Department of Computer Science, Jamal Mohamed College, Tiruchirappalli, Tamilnadu, India, Abstract: The increase in dissemination of interactive e-learning environments has allowed the collection of large repositories of data. The new emerging field, Educational Data Mining (EDM) concerns with developing methods to discover knowledge from data collected from e-learning and educational environments. EDM can be applied in modeling user knowledge, user behavior and user experience in e-learning platforms. This paper explains how Learning Factor Analysis (LFA), a data mining method is used for evaluating cognitive model and analyzing student-tutor log data for knowledge modeling. Also illustrates how learning curves can be used for visualizing the performance of the students. Keywords: e-learning, Educational Data Mining (EDM), Learning Factor Analysis (LFA) I. Introduction Educational Data Mining is an inter-disciplinary field utilizes methods from machine learning, cognitive science, data mining, statistics, and psychometrics. The main aim of EDM is to construct computational models and tools to discover knowledge by mining data taken from educational settings. The increase of e-learning resources such as interactive learning environments, learning management systems (LMS), intelligent tutoring systems (ITS), and hypermedia systems, as well as the establishment of school databases of student test scores, has created large repositories of data that can be explored by EDM researchers to understand how students learn and find out models to improve their performance. Baker [1] has classified the methods in EDM as: prediction, clustering, relationship mining, distillation of data for human judgment and discovery with models. These methods are used by the researchers [1][2] to find solutions for the following goals: 1. Predicting students‟ future learning behavior by creating student models that incorporate detailed information about students‟ knowledge, meta-cognition, motivation, and attitudes. 2. Discovering or improving domain models that characterize the content to be learned and optimal instructional sequences. 3. Studying the effects of different kinds of pedagogical support that can be provided by learning software, and 4. Advancing scientific knowledge about learning and learners through building computational models that incorporate models of the student, the software‟s pedagogy and the domain. The application areas [3] of EDM are: 1) User modeling 2) User grouping or Profiling 3) Domain modeling and 4) trend analysis. These application areas utilize EDM methods to find solutions. User modeling [3] encompasses what a learner knows, what the user experience is like, what a learner‟s behavior and motivation are, and how satisfied users are with online learning. User models are used to customize and adapt the system behaviors‟ to users specific needs so that the systems „say‟ the „right‟ thing at the „right‟ time in the „right „way [4]. This paper concerns with applying EDM method Learning factor Analysis (LFA) for User knowledge Modeling. This paper is organized as follows: section 2 lists the related works done in this research area; section 3 explains LFA method used in this research; section 4 describes methodology used, section 5 discusses the results and section 6 concludes the work. II. Literature Review A number of studies have been conducted in EDM to find the effect of using the discovered methods on student modeling. This section provides an overview of related works done by other EDM researchers. Newell and Rosenbloom[5] found a power relationship between the error rate of performance and the amount of practice .Corbett and Anderson [6] discovered a popular method for estimating students‟ knowledge is knowledge tracing model, an approach that uses a Bayesian-network-based model for estimating the probability that a student knows a skill based on observations of him or her attempting to perform the skill. Baker et.al [7] have proposed a new way to contextually estimate the probability that a student obtained a correct answer by guessing, or an incorrect answer by slipping, within Bayesian Knowledge Tracing. Koedinger
  • 2. A Study on Learning Factor Analysis – An Educational Data Mining Technique for Student… DOI: 10.9790/0661-176495101 www.iosrjournals.org 96 | Page et. al [8]demonstrated that a tutor unit, redesigned based on data-driven cognitive model improvements, helped students reach mastery more efficiently. It produced better learning on the problem-decomposition planning skills that were the focus of the cognitive model improvements. Stamper and Koedinger [9], presented a data- driven method for researchers to use data from educational technologies to identify and validate improvements in a cognitive model which used Knowledge or skill components equivalent to latent variables in a logistic regression model called the Additive Factors Model (AFM). Brent et. al [10] used learning curves to analyze a large volume of user data to explore the feasibility of using them as a reliable method for fine tuning adaptive educational system. Feng et. al[11], addressed the assessment challenge in the ASSISTment system, which is a web-based tutoring system that serves as an e-learning and e-assessment environment. They presented that the on line assessment system did a better job of predicting student knowledge by considering how much tutoring assistance was needed, how fast a student solves a problem and how many attempts were needed to finish a problem. Saranya et. al [12] proposed system regards the student‟s holistic performance by mining student data and Institutional data. Naive Bayes classification algorithm is used for classifying students into three classes – Elite, Average and Poor. Koedinger, K.R.,[13] Professor, Human Computer Interaction Institute, Carnegie Mellon University, Pittsburgh has done lot to this EDM research. He developed cognitive models and used students interaction log taken from the Cognitive Tutors, analyzed for the betterment of student learning process Better assessment models always result with quality education. Assessing student‟s ability and performance with EDM methods in e-learning environment for math education in school level in India has not been identified in our literature review. Our method is a novel approach in providing quality math education with assessments indicating the knowledge level of a student in each lesson. III. Learning Factor Analysis User modeling or student modeling identifies what a learner knows, what the learner experience is like, what a learner‟s behavior and motivation are, and how satisfied users are with e-learning. Item Response Theory and Rash model [20] is Psychometric Methods to measure students‟ ability. They lack in providing results that are easy to interpret by the users. This paper deals with identifying learners‟ knowledge level (knowledge modeling) using LFA in an e-learning environment. LFA is an EDM method for evaluating cognitive models and analysing student-tutor log data. LFA uses three components: 1) Statistical model – multiple logistic regression model is used to quantify the skills. 2) Human expertise- difficulty factors (concepts or KCs) defined by the subject experts (teachers): a set of factors that make a problem-solving step more difficult for a student and 3) A* search – a combinatorial search for model selection. A good cognitive model for a tutor uses a set of production rules or skills which specify how students solve problems. The tutor should estimate the skills learnt by each student when they practice with the tutor. The power law [5] defines the relationship between the error rate of performance and the amount of practice, depicted by equation (1).This shows that the error rate decreases according to a power function as the amount of practice increase. Y= aXb ..... (1) Where Y = the error rate X = the number of opportunities to practice a skill a = the error rate on the first trial, reflecting the intrinsic difficulty of a skill b = the learning rate, reflecting how easy a skill is to learn While the power law model applies to individual skills, it does not include student effects. In order to accommodate student effects for a cognitive model that has multiple rules, and that contains multiple students, the power law model is extended to a multiple logistic regression model (equation 2)[24]. ln[Pijt/(1-Pijt)]= Σ αi Xi + Σ βjYj + Σ γjYjTjt …….(2) Where Pijt is the probability of getting a step in a tutoring question right by the ith student‟s t th opportunity to practice the jth KC; X = the covariates for students; Y = the covariates for skills(knowledge components); T = the number of practice opportunities student i has had on knowledge component j; α = the coefficient for each student, that is, the student intercept; β = the coefficient for each knowledge component, that is, the knowledge component intercept; γ = the coefficient for the interaction between a knowledge component and its opportunities, that is, the learning curve slope. The model says that the log odds of Pijt is proportional to the overall “smarts” of that student (αi) plus the “easiness” of that KC (βj) plus the amount gained (γj) for each practice opportunity. This model can show the learning growth of students at any current or past moment. A difficulty factor refers specifically to a property of the problem that causes student difficulties. The tutor considered for this research has metric measures as lesson 1 which requires 5 skills (conversion, division,
  • 3. A Study on Learning Factor Analysis – An Educational Data Mining Technique for Student… DOI: 10.9790/0661-176495101 www.iosrjournals.org 97 | Page multiplication, addition, and result). These are the factors (KCs) in this tutor (Table 1) to be learnt by the students in solving the steps. Each step has a KC assigned to it for this study. Table 1. Factors for the Metric measures and their values Factor Names Factor Values Converion Correct formula, Incorrect Addition Correct, Wrong Multiplication Correct, Wrong Division Correct, Wrong Result Correct, Wrong The combinatorial search will select a model within the logistic regression model space. Difficulty factors are incorporated into an existing cognitive model through a model operator called Binary Split, which splits a skill a skill with a factor value, and a skill without the factor value. For example, splitting production Measurement by factor conversion leads to two productions: Measurement with the factor value Correct formula and Measurement with the factor value Incorrect. A* search is the combinatorial search algorithm [25] in LFA. It starts from an initial node, iteratively creates new adjoining nodes, explores them to reach a goal node. To limit the search space, it employs a heuristic to rank each node and visits the nodes in order of this heuristic estimate. In this study, the initial node is the existing cognitive model. Its adjoining nodes are the new models created by splitting the model on the difficulty factors. We do not specify a model to be the goal state because the structure of the best model is unknown. For this paper 25 node expansions per search is defined as the stopping criterion. AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) are two estimators used as heuristics in the search. AIC = -2*log-likelihood + 2*number of parameters. .... (3) BIC = -2*log-likelihood + number of parameters * number of observations. ..... (4) Where log-likelihood measures the fit, and the number of parameters, which is the number of covariates in equation 2, measures the complexity. Lower AIC & BIC scores, mean a better balance between model fit and complexity. IV. Methodology In this paper the LFA methodology is illustrated using data obtained from the Metric measures lesson of Mensuration Tutor MathsTutor[18] . Our dataset consist of 2,247 transactions involving 60 students, 32 unique steps and 5 Skills (KCs) in students exercise log. All the students were solving 9 problems 5 in mental problem category, 3 in simple and one in big. Total steps involved are 32. While solving exercise problem a student can ask for a hint in solving a step. Each data point is a correct or incorrect student action corresponding to a single skill execution. Student actions are coded as correct or incorrect and categorized in terms of “knowledge components” (KCs) needed to perform that action. Each step the student performs is related to a KC and is recorded as an “opportunity” for the student to show mastery of that KC. This lesson has 5 skills (conversion, division, multiplication, addition, and result) correspond to the skill needed in a step. Each step has a KC assigned to it for this study. The table 2 shows a sample data with columns: Student- name of the student; Step – problem 1 Step1; Success – Whether the student did that step correctly or not in the first attempt. 1- success and 0-failure; Skill – Knowledge component used in that step; Opportunities – Number of times the skill is used by the same student computed from the first and fourth column. Table 2. The sample data Student Step Success Skill Opportunities X P1s1 1 conversion 1 X P1s2 1 result 1 X P2s1 0 conversion 2 To find fitness of the model logistic regression values are calculated with Additive Factor Model (AFM)[26]. The values are present in Table 3.Number of parameters and number of observations in equation 3 and 4 is 60 (students) and 1920 (32unique steps x 60 students) respectively. Lower values of AIC, BIC and Root Mean Squared Error (RMSE) indicate a better fit between the model's predictions and the observed data. Two types of cross validation are run for each KC model in the dataset. These types are a 3-fold cross validation of the Additive Factor Model's (AFM)[25] error rate predictions. In student stratified, data points are grouped by student, the full set of students is divided into 3 groups. 3-fold cross validation is then performed across these 3 groups. In Item stratified, data points are grouped by step, the full set of steps is divided into 3 groups. 3-fold cross validation is then performed across these 3 groups. The Slope parameter represents how quickly students will learn the knowledge component. The larger the KC slope, the faster students learn the knowledge
  • 4. A Study on Learning Factor Analysis – An Educational Data Mining Technique for Student… DOI: 10.9790/0661-176495101 www.iosrjournals.org 98 | Page component. The conversion KC has 0 slope representing no learning takes place to be attended by the teacher. The addition KC has higher value indicating that students find it easier to solve. This table shows that this model best fitted the current tutor dataset with lower AIC, BIC, and RMSE values for the KC models used. Table 3. Logistic Regression Model values KC Model AIC BIC Log likelihood RMSE (student stratified) RMSE (item stratified) Slope Addition 1,189.43 1,545.18 -530.72 0.302511 0.288114 0.732 Conversion 1,155.22 1,511.02 -513.61 0.298859 0.284691 0.000 Division 1,190.19 1546.03 -513.09 0.301930 0.289071 0.623 Multiplication 1,193.94 1,549.76 -532.97 0.301943 0.287855 0.112 Result 1,197.65 1,553.49 -534.82 0.301916 0.287417 0.075 Learning curves [10] have become a standard tool for measurement of students‟ learning in intelligent tutoring systems. Here in our study we used learning curve to visualize the student performance over opportunities. Slope and fit of learning curves show the rate at which a student learns over time, and reveal how well the system model fits what the student is learning. We used learning curves to measure the performance of tutoring system domain or student models. Measures of student performance are described below in table 3. Regardless of metric, each point on the graph is an average across all selected knowledge components and students. Table 3. Measures of student performance Measure Description Assistance Score The number of incorrect attempts plus hint requests for a given opportunity Error Rate The percentage of students that asked for a hint or were incorrect on their first attempt. For example, an error rate of 45% means that 45% of students asked for a hint or performed an incorrect action on their first attempt. Error rate differs from assistance score in that it provides data based only on the first attempt. As such, an error rate provides no distinction between a student that made multiple incorrect attempts and a student that made only one. Number of Incorrect The number of incorrect attempts for each opportunity Number of Hints The number of hints requested for each opportunity Step Duration The elapsed time of a step in seconds, calculated by adding all of the durations for transactions that were attributed to the step. Correct Step Duration The step duration if the first attempt for the step was correct. The duration of time for which students are "silent", with respect to their interaction with the tutor, before they complete the step correctly. This is often called "reaction time" (on correct trials) in the psychology literature. If the first attempt is an error (incorrect attempt or hint request), the observation is dropped. Error Step Duration The step duration if the first attempt for the step was an error (hint request or incorrect attempt). If the first attempt is a correct attempt, the observation is dropped. Learning curve is categorised as follows:  low and flat:. The low error rate shows that students mastered the KCs but continued to receive tasks for them  no learning: the slope of the predicted learning curve shows no apparent learning for these KCs.  still high: students continued to have difficulty with these KCs. Consider increasing opportunities for practice.  too little data: students didn't practice these KCs enough for the data to be interpretable.  good: these KCs did not fall into any of the above "bad" or "at risk" categories. Thus, these are "good" learning curves in the sense that they appear to indicate substantial student learning. The above categorisations assist the teacher in knowing about the students‟ knowledge level in specific concepts to be mastered by the students V. Results And Discussions To analyse the performance of student(s), we used Datashop[13] analysis and visualization tool for generating learning curves by uploading our dataset. The fig. 1 shows the problem steps involved in the first problem and number of correct/incorrect attempts done by 60 students.
  • 5. A Study on Learning Factor Analysis – An Educational Data Mining Technique for Student… DOI: 10.9790/0661-176495101 www.iosrjournals.org 99 | Page Fig. 1, Problem steps and Attempts made in problem1 The following chart (Fig. 2) shows that the KC-conversion had maximum error rate compared with other KCs. This explains that the students struggled in conversion step (converting from one unit to other unit in metric measures lesson). Fig. 2. Error rate Vs KCs Fig. 3. Average number of hints Vs KCs From Fig. 3 it is identified that average number of hints requested by the students for conversion KC is greater than other KCs. The difficulty level of Conversion KC is greater than other KCs. It indicates that conversion KC has to be explained by the teacher in the class or more practice has to be given to the students. The Fig. 4 shows the assistance score made the students in all the 9 problems they solved. Though the fourth problem is defined in mental problem category requires 2 or 3 steps to find the solution, the students made maximum number of incorrect attempts and requested for hints. This indicates that the problem is tough for the learners and they did not understand the concept. Students took more time for solving the conversion KC than other KCs (Fig. 5). This indicates the difficulty level of that skill. Fig. 4. Assistance Score Vs Problems Fig. 5. Step Duration Vs KCs The empirical learning curve give a visual clue as to how well a student may do over a set of learning opportunities, the predicted curves allow for a more precise prediction of a success rate at any learning opportunity. The predicted learning curve is much smoother. It is computed using the Additive Factor Model (AFM)[25], which uses a set of customized Item-Response models to predict how a student will perform for each skill on each learning opportunity. The predicted learning curves are the average predicted error of a skill over each of the learning opportunities. The blue line in learning curves shows the predicted value and category is defined using the predicted value. The learning curve has some blips depending on error rate but the predicted line is very smooth.
  • 6. A Study on Learning Factor Analysis – An Educational Data Mining Technique for Student… DOI: 10.9790/0661-176495101 www.iosrjournals.org 100 | Page Fig. 6. Learning Curve for Conversion KC Fig. 7. Learning Curve for Multiplication KC Fig. 8. Learning Curve for Division Fig. 9. Learning Curve for Result KC Fig. 10. Learning Curve for Addition KC Fig. 11. Learning Curve for Single-KC From the predicted learning curve for conversion KC (Fig. 6) we can infer that „no learning‟ took place while practicing. There were 11 opportunities for conversion and 4th conversion has maximum error rate 33.3%. We understood that no conversion was at 0% error rate. The teacher can better guide the students in that area. He can do changes in domain modeling by adding new problems in examples and providing more exercises. Learning curves shown in Fig. 7 and 9 are in the category „Low and Flat‟ explains that students likely received too much practice for these KCs. This shows that the students were mastered in these skills and do not require any more practice. Fig.8 and 11 are in the category „good‟ indicate that the students got sufficient learning in that. Single-KC model in Fig. 11 shows the overall performance of the students in all the 32 unique steps are good. In 32 steps only 2 steps used addition so fig. 10 shows „too little data‟. We can add problems for this KC or it can be merged with other KCs. VI. Conclusion Student knowledge models can be improved by mining students‟ interaction data. This paper analyzed the use of LFA in student knowledge modeling in maths education with learning curves by mining the students log data. This method assists the teacher in: 1) measuring the difficulty and learning rates of Knowledge Components (KCs). 2) predict student performance in practicing each KC. 3) identify over-practiced or under- practiced KCs. The learners can understand what they know and do not know. The students with poor performance can be given with more problems for practicing. This method provides more insight into the performance of skills in every step for each student. The next step of this research is to provide a personalized tutoring environment for the students by incorporating the results into the tutor and providing automated suggestion to improve their performance. Clustering algorithms can be used to suggest the teacher in grouping the students according to their performance References [1] Baker, R. S. J. d., ( 2011), “Data Mining for Education.” In International Encyclopedia of Education, 3rd ed., Edited by B. McGaw, P. Peterson, and E. Baker. Oxford, UK: Elsevier. [2] Baker, R. S. J. D., and K. Yacef, ( 2009), “The State of Educational Data Mining in 2009: A Review and Future Visions.” Journal of Educational Data Mining 1 (1): 3–17.
  • 7. A Study on Learning Factor Analysis – An Educational Data Mining Technique for Student… DOI: 10.9790/0661-176495101 www.iosrjournals.org 101 | Page [3] S. Lakshmi Prabha, Dr.A.R.Mohamed Shanavas, (2014), EDUCATIONAL DATA MINING APPLICATIONS, Operations Research and Applications: An International Journal (ORAJ), Vol. 1, No. 1, August 2014, 23-29. [4] Feng, M., N. T. Heffernan, and K. R. Koedinger, (2009), “User Modeling and User-Adapted Interaction: Addressing the Assessment Challenge in an Online System That Tutors as It Assesses.” The Journal of Personalization Research (UMUAI journal) 19 (3): 243–266. [5] Newell, A., Rosenbloom, P.,(1981), Mechanisms of Skill Acquisition and the Law of Practice. In Anderson J. (ed.): Cognitive Skills and Their Acquisition, Erlbaum Hillsdale NJ (1981) [6] Corbett, A. T., and J. R. Anderson, (1994), “Knowledge Tracing: Modeling the Acquisition of Procedural Knowledge.” User Modeling and User-Adapted Interaction 4 (4): 253–278. doi: 10.1007/BFO1099821 [7] Baker, R.S.J.d., Corbett, A.T., Aleven, V., (2008), More Accurate Student Modeling Through Contextual Estimation of Slip and Guess Probabilities in Bayesian Knowledge Tracing. Proceedings of the 9th International Conference on Intelligent Tutoring Systems, 406-415. [8] Koedinger, K.R., Stamper, J.C., McLaughlin, E.A., & Nixon, T., (2013), Using data-driven discovery of better student models to improve student learning. In Yacef, K., Lane, H., Mostow, J., & Pavlik, P. (Eds.) In Proceedings of the 16th International Conference on Artificial Intelligence in Education, pp. 421-430. [9] Stamper, J.C., Koedinger, K.R.,(2011), Human-machine student model discovery and improvement using DataShop. In: Biswas, G., Bull, S., Kay, J., Mitrovic, A. (eds.) AIED 2011. LNCS, vol. 6738, pp. 353–360. Springer, Heidelberg (2011). [10] Brent Martin , Antonija Mitrovic , Kenneth R Koedinger , Santosh Mathan, (2011), Evaluating and Improving Adaptive Educational Systems with Learning Curves, User Modeling and User-Adapted Interaction , 2011; 21(3):249-283. DOI: 10.1007/s11257-010-9084-2. [11] Feng, M., Heffernan, N.T., & Koedinger, K.R., (2009), Addressing the assessment challenge in an Online System that tutors as it assesses. User Modeling and User-Adapted Interaction: The Journal of Personalization Research (UMUAI journal). 19(3), 243-266, August, 2009. [12] S.Saranya, R.Ayyappan , N.Kumar, (2014), Student Progress Analysis and Educational Institutional Growth Prognosis Using Data Mining, International Journal Of Engineering Sciences & Research Technology, 3(4): April, 2014, 1982-1987. [13] Koedinger, K.R., Baker, R.S.J.d., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J., (2010), A Data Repository for the EDM community: The PSLC DataShop. In Romero, C., Ventura, S., Pechenizkiy, M., Baker, R.S.J.d. (Eds.) Handbook of Educational Data Mining. Boca Raton, FL: CRC Press. [14] Surjeet Kumar Yadav, Saurabh pal, (2012), Data Mining Application in Enrollment Management: A Case Study, International Journal of Computer Applications (0975 – 8887) Volume 41– No.5, March 2012, pg:1-6. [15] Wilson, M., de Boeck, P.,(2004), Descriptive and explanatory item response models. In: de Boeck, P., Wilson, M. (eds.) Explanatory Item Response Models, pp. 43–74. Springer (2004) [16] Pooja Gulati, Dr. Archana Sharma, (2012), Educational Data Mining for Improving Educational Quality, IRACST - International Journal of Computer Science and Information Technology & Security (IJCSITS), ISSN: 2249-9555 Vol. 2, No.3, June 2012, pg.648-650. [17] Pooja Thakar, Anil Mehta, Manisha, (2015), Performance Analysis and Prediction in Educational Data Mining: A Research Travelogue, International Journal of Computer Applications (0975 – 8887) Volume 110 – No. 15, January 2015, pg:60-68. [18] Prabha, S.Lakshmi; Shanavas, A.R.Mohamed, (2014), "Implementation of E-Learning Package for Mensuration-A Branch of Mathematics," Computing and Communication Technologies (WCCCT), 2014 World Congress on , vol., no., pp.219,221, Feb. 27 2014-March 1 2014,doi:10.1109/WCCCT.2014.37 [19] Brett Van De Sande, (2013), Properties of the Bayesian Knowledge Tracing Model, Journal of Educational Data Mining, Volume 5, No 2, August, 2013,1-10. [20] Wu, M. & Adams, R., (2007), Applying the Rasch model to psycho-social measurement: A practical approach. Educational Measurement Solutions, Melbourne. [21] Romero, C.,&Ventura,S.,(2010), Educational data mining: A review of the state of the art,IEEE Transactions on systems man and Cybernetics Part C.Applications and review, 40(6),601-618. [22] Wasserman L.,(2004), All of Statistics, 1st edition, Springer-Verlag New York, LLC [23] Cen, H., Koedinger, K. & Junker, B., (2005), Automating Cognitive Model Improvement by A* Search and Logistic Regression. In Proceedings of AAAI 2005 Educational Data Mining Workshop. [24] Russell S., Norvig P.,(2003), Artificial Intelligence, 2nd edn. Prentice Hall (2003). [25] Cen, H., Koedinger, K., Junker, B., (2007), Is Over Practice Necessary? Improving Learning Efficiency with the Cognitive Tutor through Education. The 13th International Conference on Artificial Intelligence in Education (AIED 2007). 2007. [26] S. Lakshmi Prabha et al, (2015), Performance of Classification Algorithms on Students‟ Data – A Comparative Study, International Journal of Computer Science and Mobile Applications, Vol.3 Issue. 9, pg. 1-8. [27] S. Lakshmi Prabha, A.R. Mohamed Shanavas,(2015), Analysing Students Performance Using Educational Data Mining Methods, International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 10 No.82, pg. 667-671.