SlideShare a Scribd company logo
Predicting Football Match Results with Data
Mining Techniques
O. I. Aladesote, O. Agbelusi & M. Ganiyu
Abstract- Data mining techniques are very effective and useful for forecasting in many domains or fields. In this
research, prediction of Spanish la liga football match outcomes is carried out using various data mining techniques
(Multilayer Perception, Decision Tables, Random Forest, Reptree and Meta. Bagging) to determine the most accurate
among these techniques. The experimental results is done with Weka 3.9, shows that all the techniques performed well in
terms of accuracy but multilayer Perception was the most successful with an average accuracy of 100%..
I. INTRODUCTION
Football is a fast growing sport that is taking over as one the most viewed and richest sport therefore the drive to be
more than just a spectator has led to this research of being able to predict the final outcome of any match and
simultaneously making sport betting easier. One of the reasons for football being the most popular sport in the
planet is its unpredictability.
Every day, fans around the world argue over which team is going to win the next game or the next competition.
Many of these fans also put their money where their mouths are, by betting large sums on their predictions. Due to
the large amount of factors that can affect the result of a football match, it is incredibly difficult to correctly predict
its probabilities. With the increasing growth of the amount of money invested in sports betting markets, it is
important to verify how far data mining techniques can bring value to this area [9].
To solve this problem we propose building data-driven solutions designed through a data mining process. Data
mining is an aspect of computing that is used for extraction of hidden information and to automate the detection
of relevant patterns in a database. The data mining process allows us to build models that can give us predictions
according to the data that is fed into the system. The study is aimed at using data mining techniques for the
prediction of football match result. Every sport has particular rules, number of players, different styles, that is, a set
of different features. For a beginner, carrying out predictive model from the scratch with considerable dataset could
be somehow challenging. Finally, every individual especially football fans would be able to predict match result
based on identified factor at the end of this research.
We summarized the contributions of this paper as follows:
• Forecasting of la liga football match outcome using data of five previous seasons
• Comparative analysis to determine the most accurate technique.
The remainder of the paper is organized as follows: section 2 presents the literature review. In Section 3, the method
used to generate the results is presented. The experimental results for each data mining technique is presented and
discussed in section 4. Comparative analysis is done in section 5 and finally, conclusion and future work are
presented in section 6.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 18, No. 6, June 2020
46 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
II. LITERATURE REVIEW
Data mining is an important tool in event prediction. The literature selected and discussed in this section are those
that are more related and relevant to the result of football match prediction.
A match results prediction system is proposed using four data mining techniques. The author used basketball results
of four seasons (from 2005/2006 to 2009/2010) as training data and in order to assess or appraise the models, the
result of 2010/2011 season was used as test data. The result shows that the models performed with comparable
classification accuracy rate, with 67.8% as the highest [4].
The authors proposed the use of ANN and logistic regression techniques to forecast the outcome of 2014-2015
English premier league results to strengthen the complexity and inaccurate prediction results produced by statistical
approaches. The records of nine significant features are randomly selected from the records. The experimental result
of the model shows that logistic regression perform better than ANN and that the techniques show higher prediction
accuracy [17].
[13] carried out a preliminary investigation to forecast result of National Football League (NFL) using artificial
neural network (ANN). Five variables randomly extracted from first eight rounds of the competition was used for
the prediction. Teams were classified to be either strong or weak using cluster related methods.
The paper proposes data mining techniques to strengthen the limitations introduced by numeric prediction approach.
Eight years of data was used. To evaluate the performance of these techniques, both classification and regression
models were used. The experimental results clearly show that the accuracy rate of classification model outweigh
regression model [15]
The researchers carried out a performance evaluation using three classification models (naive Bayes, artificial neural
networks (ANNs) and decision trees) [16]. The models was built using different variables of NBA matches. The
experimental result shows that the accuracy of the proposed model is very reliable and that defensive fence is the
most significant variable among others. Three other variables were also chosen to be the significant.
The researcher adopted three data mining approaches to propose models for game outcome using historical data. The
purpose for this is to counter the idea of eligibility in ranking winning game based on experience. At the end of the
modeling process, all the three models were capable of forecasting the winner of the game and decision tree
produces the highest accuracy [5].
A reliable tennis match outcome prediction model is proposed with numerous factors that are systematically
prioritized to determine the match accuracy. The result shows that the proposed model with combine data and
judgement has 85.1% accuracy outcome of a match [7].
Machine learning method was adopted to forecast the result of future soccer matches based on dataset from past
matches. In this research, two important ideals were discovered as a result of some challenges encountered during
the modeling process of 2017 soccer match result. These two ideals brought about new feature engineering
methods (Recency and rating extraction) for match result forecasting. The author concluded that good forecasting
should be based on the knowledge of machine learning [3].
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 18, No. 6, June 2020
47 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
The authors developed predictive models to forecast the outcome of football match for 2008/2009 and 2015/2016
seasons. Techniques like artificial neural network (ANN), Random Forest (RF) and Support Vector Machine
(SVM) were used to develop models. Comparative analysis was made and the result shows that they are capable of
carrying out prediction correctly as compare with the result from the experience of football match analyst [8].
This paper proposes machine learning methods to determine the result of NBA match. The forecasting
process was based on the historical data, performance evaluation was done among the models developed and the
result shows that defensive rebounds features was an important features demonstrated by all the
methods for optimal prediction of the game result. Further research will be carried out using model like function
based techniques and deep learning [16].
III. METHODOLOGY
This section describes the dataset, classification techniques and performance analysis. The experiment is done using
Weka 3.9.2 on five algorithms: Multilayer Perception, Decision Tables, Random Forest, Reptree and Meta. Bagging.
In Weka, 10% cross-validation fold is adopted as classifier evaluation option.
A. Dataset
The dataset used for the implementation was the Spanish La Liga League of 2014/2015 to 2018/2019 seasons [18].
The league consists of twenty teams played both home and away matches, equaled to 380 matches per season and
1900 matches for these five seasons. The data consists of 61 features, in which 22 consists various statistical data
such as full and halt time result, home and away team shot, etc. while the remaining 39 consist of football betting
details. Out of the 22 features of the dataset, 10 features were randomly selected as predictors while full time results
(Home Win, Away Win and Lose) as the target.
B. Performance Analysis
The performance of these classification algorithms was measured based on the accuracy. Accuracy shows the rate at
which the classifier meets the correct target class, that is, it determines the instances of data correctly classified [2].
Accuracy = (1)
The total number of correctly predicted Home Win, Away Win and Lose match results is equivalent to the total
number of correctly predicted match results.
IV. RESULT AND DISCUSSION
The results of the experiment carried out on the five classification techniques would be presented and analysed
based on the percentage of accuracy of each technique. 10-foldcross validation techniques was adopted because of
small size of the data.
A. Multilayer Perception
Multilayer Perception is a type of neural network or artificial neural networks, which has appeared to be very a
valuable alternatives to old statistical techniques and does not create previous assumptions of data distribution [6].
Multilayer Perception is applied to the La Liga datasets using Weka 3.9.2. The percentage accuracy for the seasons
is 100% as depicted in Table 1 below and the result of Multilayer Perception for 2018/2019 Season in Figure 1.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 18, No. 6, June 2020
48 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
TABLE I
PERCENTAGE ACCURACY OF THE MULTILAYER PERCEPTION FOR FIVE SEASONS
Season Accuracy (%)
2018/2019 Season 100
2017/2018 Season 100
2016/2017 Season 100
2015/2016 Season 100
2014/2015 Season 100
FIGURE 1: DETAILED OUTPUT OF MULTILAYER PERCEPTION OF 2018/2019 SEASON
B. Decision Tables
Decision table is a type of rules that indicates actions to be taken when certain conditions are meant [12]. The dataset
are imported into Weka 3.9.2 and the data are run sing Decision Tables technique. The percentage accuracy for
2018/2019 season is 97.38%, 91.58% for 2017/2018 season, 94.74% for 2016/2017 season, 98.95% for 2015/2016
season and 96.84% for 2014/2015 season. The percentage for the seasons using Decision Tables is presented in
Table 2.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 18, No. 6, June 2020
49 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
TABLE II
PERCENTAGE ACCURACY OF THE DECISION TABLES FOR FIVE SEASONS
Season Accuracy (%)
2018/2019 Season 97.38
2017/2018 Season 91.58
2016/2017 Season 94.74
2015/2016 Season 98.95
2014/2015 Season 96.84
FIGURE 2: DETAILED OUTPUT OF DECISION TABLE OF 2017/2018 SEASON
C. Random Forest
Random Forest is a statistical learning mode, which is a tree-based ensemble with each node relying on group of
random variables. It performs well with small or medium dataset and can perform better than latest algorithms [1],
[11]. The dataset are imported into Weka 3.9.2 and the data are run sing Random Forest technique. The percentage
accuracy for 2018/2019 season is 98.42%, 98.95% for 2017/2018 season, 98.16% for 2016/2017 season, 97.63% for
2015/2016 season and 99.47% for 2014/2015 season. The percentage accuracy for the seasons using Random Forest
is presented in Table 3 below.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 18, No. 6, June 2020
50 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
TABLE III
PERCENTAGE ACCURACY OF THE RANDOM FOREST FOR FIVE SEASONS
Season Accuracy (%)
2018/2019 Season 98.42
2017/2018 Season 98.95
2016/2017 Season 98.16
2015/2016 Season 97.63
2014/2015 Season 99.47
Figure 3: Detailed output of Random Forest of 2016/2017 Season
D RepTree
Reduced Error Pruning Tree (Reptree) is a fast decision tree learning, which uses regression tree logic to either build
a decision using information gain as splitting principle or reduces the variance [10]. The dataset of La Liga football
League of 2014/2015 season to 2018/2019 season are implemented into Weka 3.9.2 for the prediction. The
percentage accuracy for 2018/2019 season is 98.68%, 98.68% for 2017/2018 season, 98.42% for 2016/2017 season,
97.89% for 2015/2016 season and 98.95% for 2014/2015 season. The percentage accuracy for the seasons is
presented in Table 4.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 18, No. 6, June 2020
51 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
TABLE IV
PERCENTAGE ACCURACY OF THE REPTREE FOR FIVE SEASONS
Season Accuracy (%)
2018/2019 Season 98.68
2017/2018 Season 98.16
2016/2017 Season 98.42
2015/2016 Season 97.89
2014/2015 Season 98.95
FIGURE 4: DETAILED OUTPUT OF REPTREE OF 2015/2016 SEASON
E Meta Bagging
Meta Bagging is a machine learning ensemble algorithm developed to enhance the accuracy of statistical
classification and regression of any machine learning based algorithms [14]. The dataset of La Liga football League
of 2014/2015 season to 2018/2019 season are implemented into Weka 3.9.2 for the prediction. The percentage
accuracy for 2018/2019 season is 99.74%, 98.42% for 2017/2018 season, 98.16% for 2016/2017 season, 98.42% for
2015/2016 season and 99.47% for 2014/2015 season. The percentage accuracy for the seasons is presented in Table
5.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 18, No. 6, June 2020
52 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
TABLE V
PERCENTAGE ACCURACY OF THE META BAGGING FOR FIVE SEASONS
Season Accuracy (%)
2018/2019 Season 99.74
2017/2018 Season 98.42
2016/2017 Season 98.16
2015/2016 Season 98.42
2014/2015 Season 99.47
Figure 5: Detailed output of Meta Bagging of 2014/2015 Season
V. COMPARATIVE ANALYSIS
The comparative analysis of the result shows that Multilayer Perception has the overall best average percentage
accuracy with 100%, Meta Bagging with an average accuracy of 98.84% for the seasons, Random Forest has an
average percentage accuracy of 98.53%, Reptree has an average accuracy of 98.42 while Decision Tables has the
least average accuracy of 95.90% as presented in Table 6 and Figure 6 below
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 18, No. 6, June 2020
53 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
TABLE VI
COMPARISON OF AVERAGE PERCENTAGE ACCURACY
Accuracy Multilayer
Perception
Decision Tables Random Forest Reptree Meta Bagging
2018/2019 Season 100% 97.38% 98.42% 98.68% 99.74%
2017/2018 Season 100% 91.58% 98.95% 98.16% 98.42%
2016/2017 Season 100% 94.74% 98.16% 98.42% 98.16%
2015/2016 Season 100% 98.95% 97.63% 97.89% 98.42%
2014/2015 Season 100% 96.84% 99.47% 98.95% 99.47%
Average Accuracy 100% 95.90% 98.53% 98.42% 98.84%
FIGURE 6: GRAPHICAL REPRESENTATION OF AVERAGE ACCURACY
VI. CONCLUSION AND FUTURE WORK
This work compared five data mining algorithms on Spanish la liga football match outcome. The experimental
results revealed Multilayer Perception has the most successful result, which makes it the best data mining technique
to predict la liga football match outcome with 100% accuracy as against Decision Tables with 95.90% accuracy,
Random Forest with 98.53%, Reptree with 98.42% and Meta Bagging with 98.84% accuracy. However, all data
mining techniques can also be applied in future work, consideration rating of each team as part of the variables.
References
[1] A. Cutler, D. R. Cutler, and J. R. Stevens, “Ensemble Machine Learning,” Ensemble Mach. Learn., no. January, 2012, doi:
10.1007/978-1-4419-9326-7.
[2] C. M. F. Che Mohd Rosli, M. Z. Saringat, N. Razali, and A. Mustapha, “A Comparative Study of Data Mining Techniques on Football
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 18, No. 6, June 2020
54 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
Match Prediction,” J. Phys. Conf. Ser., vol. 1020, no. 1, 2018, doi: 10.1088/1742-6596/1020/1/012003.
[3] D. Berrar, P. Lopes, and W. Dubitzky, “Incorporating domain knowledge in machine learning for soccer outcome prediction,” Mach.
Learn., vol. 108, no. 1, pp. 97–126, 2019, doi: 10.1007/s10994-018-5747-8.
[4] C. Cao, “Sports data mining technology used in basketball outcome prediction,” Dublin Inst. Technol., pp. 1–86, 2012.
[5] D. Delen, D. Cogdell, and N. Kasap, “A comparative analysis of data mining methods in predicting NCAA bowl outcomes,” Int. J.
Forecast., vol. 28, no. 2, pp. 543–552, 2012, doi: 10.1016/j.ijforecast.2011.05.002.
[6] M. W. Gardner and S. R. Dorling, “Artificial neural networks (the multilayer perceptron) - a review of applications in the atmospheric
sciences,” Atmos. Environ., vol. 32, no. 14–15, pp. 2627–2636, 1998, doi: 10.1016/S1352-2310(97)00447-0.
[7] W. Gu and T. L. Saaty, “Predicting the Outcome of a Tennis Tournament: Based on Both Data and Judgments,” J. Syst. Sci. Syst. Eng.,
vol. 28, no. 3, pp. 317–343, 2019, doi: 10.1007/s11518-018-5395-3.
[8] H. Chen, “Neural Network Algorithm in Predicting Football Match Outcome Based on Player Ability Index,” Adv. Phys. Educ., vol. 09,
no. 04, pp. 215–222, 2019, doi: 10.4236/ape.2019.94015.
[9] J. J. Zhang, E. Kim, B. Marstromartino, T. Y. Qian, and J. Nauright, “The sport industry in growing economies: critical issues and
challenges,” Int. J. Sport. Mark. Spons., vol. 19, no. 2, pp. 110–126, 2018, doi: 10.1108/IJSMS-03-2018-0023.
[10] S. Kalmegh, “Analysis of WEKA Data Mining Algorithm REPTree , Simple Cart and RandomTree for Classification of Indian News,”
Int. J. Innov. Sci. Eng. Technol., vol. 2, no. 2, pp. 438–446, 2015.
[11] R. Kohavi, “The power of decision tables,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes
Bioinformatics), vol. 912, pp. 174–189, 1995, doi: 10.1007/3-540-59286-5_57.
[12] D. Tables, F. Definition, and S. P. Decision, “(cf. (cf.,” pp. 68–80, 1991.
[13] A. Reso- and K. Self-, “Different Training Methods Perform in Calling the Games,” pp. 9–15, 1996.
[14] P. Shrivastava and M. Shukla, “Uses the Bagging Algorithm of Classification Method Learning and Forest Fire Data,” Int. J. Adv.
Comput. Eng. Netw., vol. 01, no. 12, pp. 91–95, 2014.
[15] S. J. Lee and K. Siau, “A review of data mining techniques,” Ind. Manag. Data Syst., vol. 101, no. 1, pp. 41–46, 2001, doi:
10.1108/02635570110365989.
[16] F. Thabtah, L. Zhang, and N. Abdelhamid, “NBA Game Result Prediction Using Feature Analysis and Machine Learning,” Ann. Data
Sci., vol. 6, no. 1, pp. 103–116, 2019, doi: 10.1007/s40745-018-00189-x.
[17] C.P. Igiri, E.O. Nwachukwu, "An Improved Prediction System for Football Match Result," IOSR Journal of Engineering, vol. 04, no
12, pp. 12-20, 2014, doi: 10.9790/3021-04124012020
[18] Spanish La Liga (football) dataset [Online]. Available:
https://datahub.io/sports-data/spanish-la-liga#data. [Accessed on 17 December, 2019].
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 18, No. 6, June 2020
55 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
IJCSIS
ISSN (online): 1947-5500
Please consider to contribute to and/or forward to the appropriate groups the following opportunity to submit and publish
original scientific results.
CALL FOR PAPERS
International Journal of Computer Science and Information Security (IJCSIS)
January-December 2020 Issues
The topics suggested by this issue can be discussed in term of concepts, surveys, state of the art, research,
standards, implementations, running experiments, applications, and industrial case studies. Authors are invited
to submit complete unpublished papers, which are not under review in any other conference or journal in the
following, but not limited to, topic areas.
See authors guide for manuscript preparation and submission guidelines.
Indexed by Google Scholar, DBLP, CiteSeerX, Directory for Open Access Journal (DOAJ), Bielefeld
Academic Search Engine (BASE), SCIRUS, Scopus Database, Cornell University Library, ScientificCommons,
ProQuest, EBSCO and more.
Deadline: see web site
Notification: see web site
Revision: see web site
Publication: see web site
For more topics, please see web site https://sites.google.com/site/ijcsis/
For more information, please visit the journal website (https://sites.google.com/site/ijcsis/)
 
Context-aware systems
Networking technologies
Security in network, systems, and applications
Evolutionary computation
Industrial systems
Evolutionary computation
Autonomic and autonomous systems
Bio-technologies
Knowledge data systems
Mobile and distance education
Intelligent techniques, logics and systems
Knowledge processing
Information technologies
Internet and web technologies, IoT
Digital information processing
Cognitive science and knowledge 
Agent-based systems
Mobility and multimedia systems
Systems performance
Networking and telecommunications
Software development and deployment
Knowledge virtualization
Systems and networks on the chip
Knowledge for global defense
Information Systems [IS]
IPv6 Today - Technology and deployment
Modeling
Software Engineering
Optimization
Complexity
Natural Language Processing
Speech Synthesis
Data Mining 

More Related Content

PDF
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
PDF
Publication - The feasibility of gaze tracking for “mind reading” during search
PDF
A REVIEW ON PREDICTIVE ANALYTICS IN DATA MINING
PDF
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
PPTX
Machine learning algorithms
PDF
50120140504015
PDF
Data Imputation by Soft Computing
PPTX
Machine learning algorithms and business use cases
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
Publication - The feasibility of gaze tracking for “mind reading” during search
A REVIEW ON PREDICTIVE ANALYTICS IN DATA MINING
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
Machine learning algorithms
50120140504015
Data Imputation by Soft Computing
Machine learning algorithms and business use cases

Similar to Predicting Football Match Results with Data Mining Techniques (20)

PDF
B04124012020
PDF
PDF
Cricket Score and Winning Prediction
PDF
PredictXI- Best Fantasy Team Forecasting
PPTX
Football Result Prediction using Dixon Coles Algorithm
PPTX
Cricket predictor
PDF
IRJET-V8I11270.pdf
PPTX
IPL Match winning prediction using machine learning
PPTX
IPL Match Prediction System Using Machine Learning.pptx
PDF
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
PDF
I2 madankarky1 jharibabu
PDF
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
PDF
Social Networking Site Data Analytics Using Game Theory Model
PDF
Familiarising Probabilistic Distance Clustering System of Evolving Awale Player
PDF
10.1.1.735.795.pdf
PDF
Basketball players performance analytic as experiential learning approach
PPTX
IPL match winning predicion using machine learnong
DOCX
Joseph Moore Dissertation
PDF
Elg 5100 project report anurag & jayanshu
PDF
INCREASED PREDICTION ACCURACY IN THE GAME OF CRICKETUSING MACHINE LEARNING
B04124012020
Cricket Score and Winning Prediction
PredictXI- Best Fantasy Team Forecasting
Football Result Prediction using Dixon Coles Algorithm
Cricket predictor
IRJET-V8I11270.pdf
IPL Match winning prediction using machine learning
IPL Match Prediction System Using Machine Learning.pptx
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
I2 madankarky1 jharibabu
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Social Networking Site Data Analytics Using Game Theory Model
Familiarising Probabilistic Distance Clustering System of Evolving Awale Player
10.1.1.735.795.pdf
Basketball players performance analytic as experiential learning approach
IPL match winning predicion using machine learnong
Joseph Moore Dissertation
Elg 5100 project report anurag & jayanshu
INCREASED PREDICTION ACCURACY IN THE GAME OF CRICKETUSING MACHINE LEARNING
Ad

Recently uploaded (20)

PPTX
UNIT III MENTAL HEALTH NURSING ASSESSMENT
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
1_English_Language_Set_2.pdf probationary
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
Digestion and Absorption of Carbohydrates, Proteina and Fats
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
advance database management system book.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
UNIT III MENTAL HEALTH NURSING ASSESSMENT
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
1_English_Language_Set_2.pdf probationary
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Digestion and Absorption of Carbohydrates, Proteina and Fats
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
Weekly quiz Compilation Jan -July 25.pdf
Final Presentation General Medicine 03-08-2024.pptx
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
A powerpoint presentation on the Revised K-10 Science Shaping Paper
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
History, Philosophy and sociology of education (1).pptx
A systematic review of self-coping strategies used by university students to ...
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
advance database management system book.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Ad

Predicting Football Match Results with Data Mining Techniques

  • 1. Predicting Football Match Results with Data Mining Techniques O. I. Aladesote, O. Agbelusi & M. Ganiyu Abstract- Data mining techniques are very effective and useful for forecasting in many domains or fields. In this research, prediction of Spanish la liga football match outcomes is carried out using various data mining techniques (Multilayer Perception, Decision Tables, Random Forest, Reptree and Meta. Bagging) to determine the most accurate among these techniques. The experimental results is done with Weka 3.9, shows that all the techniques performed well in terms of accuracy but multilayer Perception was the most successful with an average accuracy of 100%.. I. INTRODUCTION Football is a fast growing sport that is taking over as one the most viewed and richest sport therefore the drive to be more than just a spectator has led to this research of being able to predict the final outcome of any match and simultaneously making sport betting easier. One of the reasons for football being the most popular sport in the planet is its unpredictability. Every day, fans around the world argue over which team is going to win the next game or the next competition. Many of these fans also put their money where their mouths are, by betting large sums on their predictions. Due to the large amount of factors that can affect the result of a football match, it is incredibly difficult to correctly predict its probabilities. With the increasing growth of the amount of money invested in sports betting markets, it is important to verify how far data mining techniques can bring value to this area [9]. To solve this problem we propose building data-driven solutions designed through a data mining process. Data mining is an aspect of computing that is used for extraction of hidden information and to automate the detection of relevant patterns in a database. The data mining process allows us to build models that can give us predictions according to the data that is fed into the system. The study is aimed at using data mining techniques for the prediction of football match result. Every sport has particular rules, number of players, different styles, that is, a set of different features. For a beginner, carrying out predictive model from the scratch with considerable dataset could be somehow challenging. Finally, every individual especially football fans would be able to predict match result based on identified factor at the end of this research. We summarized the contributions of this paper as follows: • Forecasting of la liga football match outcome using data of five previous seasons • Comparative analysis to determine the most accurate technique. The remainder of the paper is organized as follows: section 2 presents the literature review. In Section 3, the method used to generate the results is presented. The experimental results for each data mining technique is presented and discussed in section 4. Comparative analysis is done in section 5 and finally, conclusion and future work are presented in section 6. International Journal of Computer Science and Information Security (IJCSIS), Vol. 18, No. 6, June 2020 46 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 2. II. LITERATURE REVIEW Data mining is an important tool in event prediction. The literature selected and discussed in this section are those that are more related and relevant to the result of football match prediction. A match results prediction system is proposed using four data mining techniques. The author used basketball results of four seasons (from 2005/2006 to 2009/2010) as training data and in order to assess or appraise the models, the result of 2010/2011 season was used as test data. The result shows that the models performed with comparable classification accuracy rate, with 67.8% as the highest [4]. The authors proposed the use of ANN and logistic regression techniques to forecast the outcome of 2014-2015 English premier league results to strengthen the complexity and inaccurate prediction results produced by statistical approaches. The records of nine significant features are randomly selected from the records. The experimental result of the model shows that logistic regression perform better than ANN and that the techniques show higher prediction accuracy [17]. [13] carried out a preliminary investigation to forecast result of National Football League (NFL) using artificial neural network (ANN). Five variables randomly extracted from first eight rounds of the competition was used for the prediction. Teams were classified to be either strong or weak using cluster related methods. The paper proposes data mining techniques to strengthen the limitations introduced by numeric prediction approach. Eight years of data was used. To evaluate the performance of these techniques, both classification and regression models were used. The experimental results clearly show that the accuracy rate of classification model outweigh regression model [15] The researchers carried out a performance evaluation using three classification models (naive Bayes, artificial neural networks (ANNs) and decision trees) [16]. The models was built using different variables of NBA matches. The experimental result shows that the accuracy of the proposed model is very reliable and that defensive fence is the most significant variable among others. Three other variables were also chosen to be the significant. The researcher adopted three data mining approaches to propose models for game outcome using historical data. The purpose for this is to counter the idea of eligibility in ranking winning game based on experience. At the end of the modeling process, all the three models were capable of forecasting the winner of the game and decision tree produces the highest accuracy [5]. A reliable tennis match outcome prediction model is proposed with numerous factors that are systematically prioritized to determine the match accuracy. The result shows that the proposed model with combine data and judgement has 85.1% accuracy outcome of a match [7]. Machine learning method was adopted to forecast the result of future soccer matches based on dataset from past matches. In this research, two important ideals were discovered as a result of some challenges encountered during the modeling process of 2017 soccer match result. These two ideals brought about new feature engineering methods (Recency and rating extraction) for match result forecasting. The author concluded that good forecasting should be based on the knowledge of machine learning [3]. International Journal of Computer Science and Information Security (IJCSIS), Vol. 18, No. 6, June 2020 47 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 3. The authors developed predictive models to forecast the outcome of football match for 2008/2009 and 2015/2016 seasons. Techniques like artificial neural network (ANN), Random Forest (RF) and Support Vector Machine (SVM) were used to develop models. Comparative analysis was made and the result shows that they are capable of carrying out prediction correctly as compare with the result from the experience of football match analyst [8]. This paper proposes machine learning methods to determine the result of NBA match. The forecasting process was based on the historical data, performance evaluation was done among the models developed and the result shows that defensive rebounds features was an important features demonstrated by all the methods for optimal prediction of the game result. Further research will be carried out using model like function based techniques and deep learning [16]. III. METHODOLOGY This section describes the dataset, classification techniques and performance analysis. The experiment is done using Weka 3.9.2 on five algorithms: Multilayer Perception, Decision Tables, Random Forest, Reptree and Meta. Bagging. In Weka, 10% cross-validation fold is adopted as classifier evaluation option. A. Dataset The dataset used for the implementation was the Spanish La Liga League of 2014/2015 to 2018/2019 seasons [18]. The league consists of twenty teams played both home and away matches, equaled to 380 matches per season and 1900 matches for these five seasons. The data consists of 61 features, in which 22 consists various statistical data such as full and halt time result, home and away team shot, etc. while the remaining 39 consist of football betting details. Out of the 22 features of the dataset, 10 features were randomly selected as predictors while full time results (Home Win, Away Win and Lose) as the target. B. Performance Analysis The performance of these classification algorithms was measured based on the accuracy. Accuracy shows the rate at which the classifier meets the correct target class, that is, it determines the instances of data correctly classified [2]. Accuracy = (1) The total number of correctly predicted Home Win, Away Win and Lose match results is equivalent to the total number of correctly predicted match results. IV. RESULT AND DISCUSSION The results of the experiment carried out on the five classification techniques would be presented and analysed based on the percentage of accuracy of each technique. 10-foldcross validation techniques was adopted because of small size of the data. A. Multilayer Perception Multilayer Perception is a type of neural network or artificial neural networks, which has appeared to be very a valuable alternatives to old statistical techniques and does not create previous assumptions of data distribution [6]. Multilayer Perception is applied to the La Liga datasets using Weka 3.9.2. The percentage accuracy for the seasons is 100% as depicted in Table 1 below and the result of Multilayer Perception for 2018/2019 Season in Figure 1. International Journal of Computer Science and Information Security (IJCSIS), Vol. 18, No. 6, June 2020 48 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 4. TABLE I PERCENTAGE ACCURACY OF THE MULTILAYER PERCEPTION FOR FIVE SEASONS Season Accuracy (%) 2018/2019 Season 100 2017/2018 Season 100 2016/2017 Season 100 2015/2016 Season 100 2014/2015 Season 100 FIGURE 1: DETAILED OUTPUT OF MULTILAYER PERCEPTION OF 2018/2019 SEASON B. Decision Tables Decision table is a type of rules that indicates actions to be taken when certain conditions are meant [12]. The dataset are imported into Weka 3.9.2 and the data are run sing Decision Tables technique. The percentage accuracy for 2018/2019 season is 97.38%, 91.58% for 2017/2018 season, 94.74% for 2016/2017 season, 98.95% for 2015/2016 season and 96.84% for 2014/2015 season. The percentage for the seasons using Decision Tables is presented in Table 2. International Journal of Computer Science and Information Security (IJCSIS), Vol. 18, No. 6, June 2020 49 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 5. TABLE II PERCENTAGE ACCURACY OF THE DECISION TABLES FOR FIVE SEASONS Season Accuracy (%) 2018/2019 Season 97.38 2017/2018 Season 91.58 2016/2017 Season 94.74 2015/2016 Season 98.95 2014/2015 Season 96.84 FIGURE 2: DETAILED OUTPUT OF DECISION TABLE OF 2017/2018 SEASON C. Random Forest Random Forest is a statistical learning mode, which is a tree-based ensemble with each node relying on group of random variables. It performs well with small or medium dataset and can perform better than latest algorithms [1], [11]. The dataset are imported into Weka 3.9.2 and the data are run sing Random Forest technique. The percentage accuracy for 2018/2019 season is 98.42%, 98.95% for 2017/2018 season, 98.16% for 2016/2017 season, 97.63% for 2015/2016 season and 99.47% for 2014/2015 season. The percentage accuracy for the seasons using Random Forest is presented in Table 3 below. International Journal of Computer Science and Information Security (IJCSIS), Vol. 18, No. 6, June 2020 50 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 6. TABLE III PERCENTAGE ACCURACY OF THE RANDOM FOREST FOR FIVE SEASONS Season Accuracy (%) 2018/2019 Season 98.42 2017/2018 Season 98.95 2016/2017 Season 98.16 2015/2016 Season 97.63 2014/2015 Season 99.47 Figure 3: Detailed output of Random Forest of 2016/2017 Season D RepTree Reduced Error Pruning Tree (Reptree) is a fast decision tree learning, which uses regression tree logic to either build a decision using information gain as splitting principle or reduces the variance [10]. The dataset of La Liga football League of 2014/2015 season to 2018/2019 season are implemented into Weka 3.9.2 for the prediction. The percentage accuracy for 2018/2019 season is 98.68%, 98.68% for 2017/2018 season, 98.42% for 2016/2017 season, 97.89% for 2015/2016 season and 98.95% for 2014/2015 season. The percentage accuracy for the seasons is presented in Table 4. International Journal of Computer Science and Information Security (IJCSIS), Vol. 18, No. 6, June 2020 51 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 7. TABLE IV PERCENTAGE ACCURACY OF THE REPTREE FOR FIVE SEASONS Season Accuracy (%) 2018/2019 Season 98.68 2017/2018 Season 98.16 2016/2017 Season 98.42 2015/2016 Season 97.89 2014/2015 Season 98.95 FIGURE 4: DETAILED OUTPUT OF REPTREE OF 2015/2016 SEASON E Meta Bagging Meta Bagging is a machine learning ensemble algorithm developed to enhance the accuracy of statistical classification and regression of any machine learning based algorithms [14]. The dataset of La Liga football League of 2014/2015 season to 2018/2019 season are implemented into Weka 3.9.2 for the prediction. The percentage accuracy for 2018/2019 season is 99.74%, 98.42% for 2017/2018 season, 98.16% for 2016/2017 season, 98.42% for 2015/2016 season and 99.47% for 2014/2015 season. The percentage accuracy for the seasons is presented in Table 5. International Journal of Computer Science and Information Security (IJCSIS), Vol. 18, No. 6, June 2020 52 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 8. TABLE V PERCENTAGE ACCURACY OF THE META BAGGING FOR FIVE SEASONS Season Accuracy (%) 2018/2019 Season 99.74 2017/2018 Season 98.42 2016/2017 Season 98.16 2015/2016 Season 98.42 2014/2015 Season 99.47 Figure 5: Detailed output of Meta Bagging of 2014/2015 Season V. COMPARATIVE ANALYSIS The comparative analysis of the result shows that Multilayer Perception has the overall best average percentage accuracy with 100%, Meta Bagging with an average accuracy of 98.84% for the seasons, Random Forest has an average percentage accuracy of 98.53%, Reptree has an average accuracy of 98.42 while Decision Tables has the least average accuracy of 95.90% as presented in Table 6 and Figure 6 below International Journal of Computer Science and Information Security (IJCSIS), Vol. 18, No. 6, June 2020 53 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 9. TABLE VI COMPARISON OF AVERAGE PERCENTAGE ACCURACY Accuracy Multilayer Perception Decision Tables Random Forest Reptree Meta Bagging 2018/2019 Season 100% 97.38% 98.42% 98.68% 99.74% 2017/2018 Season 100% 91.58% 98.95% 98.16% 98.42% 2016/2017 Season 100% 94.74% 98.16% 98.42% 98.16% 2015/2016 Season 100% 98.95% 97.63% 97.89% 98.42% 2014/2015 Season 100% 96.84% 99.47% 98.95% 99.47% Average Accuracy 100% 95.90% 98.53% 98.42% 98.84% FIGURE 6: GRAPHICAL REPRESENTATION OF AVERAGE ACCURACY VI. CONCLUSION AND FUTURE WORK This work compared five data mining algorithms on Spanish la liga football match outcome. The experimental results revealed Multilayer Perception has the most successful result, which makes it the best data mining technique to predict la liga football match outcome with 100% accuracy as against Decision Tables with 95.90% accuracy, Random Forest with 98.53%, Reptree with 98.42% and Meta Bagging with 98.84% accuracy. However, all data mining techniques can also be applied in future work, consideration rating of each team as part of the variables. References [1] A. Cutler, D. R. Cutler, and J. R. Stevens, “Ensemble Machine Learning,” Ensemble Mach. Learn., no. January, 2012, doi: 10.1007/978-1-4419-9326-7. [2] C. M. F. Che Mohd Rosli, M. Z. Saringat, N. Razali, and A. Mustapha, “A Comparative Study of Data Mining Techniques on Football International Journal of Computer Science and Information Security (IJCSIS), Vol. 18, No. 6, June 2020 54 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 10. Match Prediction,” J. Phys. Conf. Ser., vol. 1020, no. 1, 2018, doi: 10.1088/1742-6596/1020/1/012003. [3] D. Berrar, P. Lopes, and W. Dubitzky, “Incorporating domain knowledge in machine learning for soccer outcome prediction,” Mach. Learn., vol. 108, no. 1, pp. 97–126, 2019, doi: 10.1007/s10994-018-5747-8. [4] C. Cao, “Sports data mining technology used in basketball outcome prediction,” Dublin Inst. Technol., pp. 1–86, 2012. [5] D. Delen, D. Cogdell, and N. Kasap, “A comparative analysis of data mining methods in predicting NCAA bowl outcomes,” Int. J. Forecast., vol. 28, no. 2, pp. 543–552, 2012, doi: 10.1016/j.ijforecast.2011.05.002. [6] M. W. Gardner and S. R. Dorling, “Artificial neural networks (the multilayer perceptron) - a review of applications in the atmospheric sciences,” Atmos. Environ., vol. 32, no. 14–15, pp. 2627–2636, 1998, doi: 10.1016/S1352-2310(97)00447-0. [7] W. Gu and T. L. Saaty, “Predicting the Outcome of a Tennis Tournament: Based on Both Data and Judgments,” J. Syst. Sci. Syst. Eng., vol. 28, no. 3, pp. 317–343, 2019, doi: 10.1007/s11518-018-5395-3. [8] H. Chen, “Neural Network Algorithm in Predicting Football Match Outcome Based on Player Ability Index,” Adv. Phys. Educ., vol. 09, no. 04, pp. 215–222, 2019, doi: 10.4236/ape.2019.94015. [9] J. J. Zhang, E. Kim, B. Marstromartino, T. Y. Qian, and J. Nauright, “The sport industry in growing economies: critical issues and challenges,” Int. J. Sport. Mark. Spons., vol. 19, no. 2, pp. 110–126, 2018, doi: 10.1108/IJSMS-03-2018-0023. [10] S. Kalmegh, “Analysis of WEKA Data Mining Algorithm REPTree , Simple Cart and RandomTree for Classification of Indian News,” Int. J. Innov. Sci. Eng. Technol., vol. 2, no. 2, pp. 438–446, 2015. [11] R. Kohavi, “The power of decision tables,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 912, pp. 174–189, 1995, doi: 10.1007/3-540-59286-5_57. [12] D. Tables, F. Definition, and S. P. Decision, “(cf. (cf.,” pp. 68–80, 1991. [13] A. Reso- and K. Self-, “Different Training Methods Perform in Calling the Games,” pp. 9–15, 1996. [14] P. Shrivastava and M. Shukla, “Uses the Bagging Algorithm of Classification Method Learning and Forest Fire Data,” Int. J. Adv. Comput. Eng. Netw., vol. 01, no. 12, pp. 91–95, 2014. [15] S. J. Lee and K. Siau, “A review of data mining techniques,” Ind. Manag. Data Syst., vol. 101, no. 1, pp. 41–46, 2001, doi: 10.1108/02635570110365989. [16] F. Thabtah, L. Zhang, and N. Abdelhamid, “NBA Game Result Prediction Using Feature Analysis and Machine Learning,” Ann. Data Sci., vol. 6, no. 1, pp. 103–116, 2019, doi: 10.1007/s40745-018-00189-x. [17] C.P. Igiri, E.O. Nwachukwu, "An Improved Prediction System for Football Match Result," IOSR Journal of Engineering, vol. 04, no 12, pp. 12-20, 2014, doi: 10.9790/3021-04124012020 [18] Spanish La Liga (football) dataset [Online]. Available: https://datahub.io/sports-data/spanish-la-liga#data. [Accessed on 17 December, 2019]. International Journal of Computer Science and Information Security (IJCSIS), Vol. 18, No. 6, June 2020 55 https://sites.google.com/site/ijcsis/ ISSN 1947-5500
  • 11. IJCSIS ISSN (online): 1947-5500 Please consider to contribute to and/or forward to the appropriate groups the following opportunity to submit and publish original scientific results. CALL FOR PAPERS International Journal of Computer Science and Information Security (IJCSIS) January-December 2020 Issues The topics suggested by this issue can be discussed in term of concepts, surveys, state of the art, research, standards, implementations, running experiments, applications, and industrial case studies. Authors are invited to submit complete unpublished papers, which are not under review in any other conference or journal in the following, but not limited to, topic areas. See authors guide for manuscript preparation and submission guidelines. Indexed by Google Scholar, DBLP, CiteSeerX, Directory for Open Access Journal (DOAJ), Bielefeld Academic Search Engine (BASE), SCIRUS, Scopus Database, Cornell University Library, ScientificCommons, ProQuest, EBSCO and more. Deadline: see web site Notification: see web site Revision: see web site Publication: see web site For more topics, please see web site https://sites.google.com/site/ijcsis/ For more information, please visit the journal website (https://sites.google.com/site/ijcsis/)   Context-aware systems Networking technologies Security in network, systems, and applications Evolutionary computation Industrial systems Evolutionary computation Autonomic and autonomous systems Bio-technologies Knowledge data systems Mobile and distance education Intelligent techniques, logics and systems Knowledge processing Information technologies Internet and web technologies, IoT Digital information processing Cognitive science and knowledge  Agent-based systems Mobility and multimedia systems Systems performance Networking and telecommunications Software development and deployment Knowledge virtualization Systems and networks on the chip Knowledge for global defense Information Systems [IS] IPv6 Today - Technology and deployment Modeling Software Engineering Optimization Complexity Natural Language Processing Speech Synthesis Data Mining