SlideShare a Scribd company logo
How can Facebook data and
machine learning algorithms
predict political elections?
Introduction
◉ Political elections are the daily routine of every
democratic society and most European
countries have an average of one election
every year (local, parliamentary, presidential, EU
parliament ) .
◉ The eternal question: predicting the results.
○ everyone is interested: citizens, political actors who,
create strategies and activities .
2
Introduction
◉ Development of social networks: powerful
tool for communication with voters!
○ Can we develop predictive model based on
candidates activity on the social network Facebook
that is comparable to traditional models based on
polls?
3
Related work
◉ Review of the literature identified lack of
previous research:
◉ a model for measuring the effectiveness of political campaigns on
social networks has not been developed: variables that are suitable for
evaluation in one specific domain are not suitable for evaluation in
another domain,
◉ in most research, data was collected only on Twitter ,
◉ most of the previous works do not have a component of predictive
analytics,
◉ no models have been developed that would determine to what extent
activity on the social network Facebook contributes to predicting the
outcome of the election. 4
Related work
◉ Review of the literature identified lack of
previous research:
◉ Various models are based on the number of friends or followers on
social networks: the conclusions are generalized based on too few
variables that do not include the interaction between the candidate
and the potential voter.
◉ Most models are based on sentiment analysis - not sufficient to
determine the impact on the outcome of the election.
◉ Current methods are not sufficient - necessity of machine learning.
◉ Many models are not empirically confirmed.
5
Research objectives
◉ To determine the predictive power of models based on the
social network Facebook and compare it with other types of
research .
◉ To determine which variables are the most significant
predictors of the outcome of the election .
◉ To determine the significance of the temporal data component
of the Facebook social network .
◉ To argue which of the four machine learning methods
provides the most accurate predictive models of local election
outcomes.
6
Research flow
Predictive
models
development
based on
Facebook data
and four
machine
learning
algorithms
Data collection
and
preparation:
Facebook data
Model
evaluation
Comparison of
models with
opinion polls
data
7
Research methodology
CRISP DM
8
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Research methodology –
CRISP DM
9
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Research goals:
- determine the predictive power of
election outcome models based on
the social network Facebook and
compare it with other types of
research,
- determine which variables are the
most significant predictors of the
outcome of the election.
Research methodology –
CRISP DM
10
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
France local elections: all cities
with more than 100 000
habitants (41 city),
225 candidates – examination
and downloading data from their
Facebook pages
Number of events, photos, links,
videos, statuses – their
comments, shares and likes.
Research methodology –
CRISP DM
11
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
25 variables included:
- 24 input variables: the candidate's
activity and the reactions of the page
followers to the candidate's activity,
two variables related to the gender
of the candidate and the affiliation of
the candidate to a political party.
- one output variable: the result in
the elections measured by the
percentage of votes of the
candidates in the elections.
Research methodology –
CRISP DM
12
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Why Facebook?
Chen and Chang (2017), claim that
political campaigns are increasingly
"fleeing" to Facebook - social network
with the largest number of users.
Facebook is the most used source of
political news for Millennials and
Generation X (ages 18 to 51) (Pew
Research Center, 2015).
France - 70% of the population of
those who use social networks is on
the Facebook (Chaffey, 2019).
Singh and colleagues (2020), who
conducted the research only on
Twitter, stated the need to include
other social networks.
Research methodology –
CRISP DM
13
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Attributes taken into account are:
city and turnout in the city, party, total
number of page likes, number of photos,
number of statuses (text), number of
links, number of videos, number of
events created, number of photo shares,
number of status shares, number of link
shares, number of video shares, number
of likes photo, number of status likes,
number of link likes, number of video
likes, number of photo comments,
number of status comments, number of
link comments, number of video
comments, publication time.
Transformation into relative sizes with
respect to the number of voters.
Research methodology –
CRISP DM
14
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Data description through descriptive
statistics .
Most variables have an exponential
distribution of values:
- characterized by a high probability of
occurrence of smaller values, and a low
probability of occurrence of large values.
- values of the arithmetic mean are higher
than the values of the median.
Research methodology –
CRISP DM
15
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Correlation analysis
- the highest correlation between the
number of event sharing and number of
event likes ,
- correlation is positive, which indicates
that with the increase in event likes,
the number of event shares also
increases, and vice versa .
- variable Result - has the highest linear
relationship with the variable Total
number of page likes
Research methodology –
CRISP DM
16
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Outliers
- ML algorithms are sensitive to extreme
values.
- Interquartiles were used to identify
outliers.
- Missing values
- Imputation was performed by inserting
data instead of missing values .
- missing values were replaced by the
mean values of the variables.
Research methodology –
CRISP DM
17
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Data normalization
- data were normalized with respect to
the number of voters in each city. The
original value of each variable is
divided by the number of voters in the
city to which it refers.
- Min- max normalization - all variables
to a scale from 0 to 1.
Research methodology –
CRISP DM
18
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
- Predictive models were developed
using four machine learning
algorithms:
- optimization of hyperparameters was
carried out of each individual
algorithm in order to prevent
overtraining of the model, and to
obtain high-quality, reliable and
accurate predictive models.
Research methodology –
CRISP DM
19
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Four types of machine learning
algorithms
- Machine learning based on
information (Classification and
regression trees).
- Machine learning based on similarity
(k-nearest neighbors),
- Machine learning based on probability
(Bayesian networks),
- Error-based machine learning (artificial
neural networks).
Models comparison
Model Error
K-NN 8.9889
Neural network 5.9779712
Decision tree 6.3441513
Naïve Bayes 12.4532
20
Models comparison
Model Reliability
K-NN 0.5044
Neural network 0.8093
Decision tree 0.753
Naïve Bayes -0.243
21
Model comparison
◉ In order to compare the predictive models obtained by
machine learning algorithms and the results of pre-election
polls, data provided by the French Institute for Public Opinion
Research, IFOP, was used.
◉ The data refer to 6 cities: Paris, Lyon , Marseille , Rennes ,
Nantes and Bordeaux , thus including data for a total of 44
candidates in the elections.
◉ There is difference in error of predictive models obtained by
machine learning algorithms and polling!
22
Research methodology –
CRISP DM
23
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Determination of the most significant
predictors of election outcomes based on
the obtained predictive models.
Sensitivity analysis of each predictive
model was performed.
Prediction of results
N kNN rank ANN rank Bayes rank
Decision
tree rank
Average
rank
Overall page
likes
1 1 1 1 1
Number of
events
6 7 3 4 1
Number of
photos
5 8 5 9 6
Number of
links
3 9 7 7 6.5
Number of
statuses
4 3 9 3 4.75
24
Prediction of results
N kNN rank ANN rank Bayes rank
Decision
tree rank
Average
rank
Number of
photo likes
8 6 8 6 7
Number of
link likes
2 2 2 2 2
Number of
status likes
7 5 6 10 7
Gender 9 10 10 8 9.25
Political
party
10 4 4 5 5.75
25
Results discussion
◉ Stability and consistency of the models.
◉ The total number of likes on the candidate's page is the most
significant predictor in all models.
◉ The number of link likes is the second most significant
predictor also in all four models.
◉ The number of statuses is the third strongest predictor of
election results in two of the four observed models.
26
Results discussion
◉ The best quality models are obtained by applying the artificial
neural network algorithm.
◉ Explanation lies in the data types : dummy variables are
mostly numeric continuous attributes .
◉ The models with the lowest reliability and accuracy values
were obtained by the Naive Bayesian classifier
◉ Naive Bayesian classifier works with categorical variables, so it
is necessary to transform the continuous output into a
categorical one .
27
Results discussion
◉ Social media as an indicator for predicting election outcomes
◉ the absolute number of Facebook followers is a very good
predictor of election outcomes
◉ the content that the candidates place via social media, as well
as the reactions to the specific content that they share, is to a
certain extent an indicator of the outcome of the election
◉ The results of the research lead to a better understanding of
the way in which social networks present the views of voters,
and indicate how the electorate can be influenced through
social networks.
28
Guidelines
◉ How to run campaign on Facebook?
◉ Achieving visibility on Facebook - to get a large number of
page likes or followers (various methods of increasing
visibility), which generates a high place among average
Facebook users in the news feed and there is a greater
transmission of messages and an indirect influence on the
voters.
◉ To present content from ordinary life that gives greater
visibility than classic political messages that are transfer
through other communication channels
29
Research limitations
◉ Only one country, France, and one elections were included.
◉ Each country has its own specificities that should be taken into
account when conducting research.
◉ The opinion of the user base of one social network, Facebook,
is not representative of the entire population, but it contributes
to the phenomenon of explaining the results.
◉ The data refer to a very specific time, the beginning of March
2020 - the beginning of the spread of the COVID-19 virus and
the lockdown .
◉ Four machine learning algorithms were used - number of
algorithms left out.
30
Research contrubtions
◉ knowledge about models based on data from public opinion
polls and social networks is systematized
◉ predictive models of election outcomes based on social
network data were developed and evaluated and compared
with data obtained from surveys
◉ the predictive power of the variables of the social network
Facebook was established
◉ guidelines for using machine learning algorithms on social
media data are developed.
31
“
"The whole art of politics consists
in the rational management of
human irrationalities."
Karl Paul Reinhold Niebuhr
32
Any questions ?
You can find me at
◉ kisic.alen@gmail.com
Thanks!
33

More Related Content

PDF
WPA Predictive Analytics Capabilities
PPTX
Show me your friends, and I will tell you whom you vote for: Predicting votin...
PPTX
Sentiment analysis of pre elections tweets (general elections)
PPTX
algorithmic-decisions, fairness, machine learning, provenance, transparency
PPTX
**How Big Data is Transforming Elections: Smarter Strategies and Improved Man...
PDF
Twitter Based Election Prediction and Analysis
PPT
Empowering Digital Direct Democracy: Policy making via Stance Classification
PDF
Change Up Your 2016 Election Coverage. Create a Computational Campaign.
WPA Predictive Analytics Capabilities
Show me your friends, and I will tell you whom you vote for: Predicting votin...
Sentiment analysis of pre elections tweets (general elections)
algorithmic-decisions, fairness, machine learning, provenance, transparency
**How Big Data is Transforming Elections: Smarter Strategies and Improved Man...
Twitter Based Election Prediction and Analysis
Empowering Digital Direct Democracy: Policy making via Stance Classification
Change Up Your 2016 Election Coverage. Create a Computational Campaign.

Similar to [DSC Europe 23] Alen Kisic - How can do Facebook data and machine learning algorithms predict political elections? (20)

PDF
PREDICTING ELECTION OUTCOME FROM SOCIAL MEDIA DATA
PDF
PREDICTING ELECTION OUTCOME FROM SOCIAL MEDIA DATA
PPTX
BIG DATA IN INDIAN ELECTION.pptx
PDF
Twitter Based Outcome Predictions of 2019 Indian General Elections Using Deci...
PDF
PREDICTING ELECTION OUTCOME FROM SOCIAL MEDIA DATA
PDF
PREDICTING ELECTION OUTCOME FROM SOCIAL MEDIA DATA
PDF
PARTICIPATION ANTICIPATING IN ELECTIONS USING DATA MINING METHODS
PDF
Election Result Prediction using Twitter Analysis
PDF
Towards Human-Centered Machine Learning
PPTX
Magellan Strategies Colorado Voter Segmentation Overview 052114
PDF
IRJET - Election Result Prediction using Sentiment Analysis
PDF
Data mining
PPTX
Transparency in ML and AI (humble views from a concerned academic)
PDF
Inferring social media user attributes using language and network information
PPT
Pre-Dac-Presentation [Autosaved]ph-d.ppt
PPTX
Responsible AI in Industry: Practical Challenges and Lessons Learned
PDF
Fairness in Machine Learning @Codemotion
PPT
Expression of Political Opinions in Press
PDF
Interpretable machine learning : Methods for understanding complex models
PDF
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...
PREDICTING ELECTION OUTCOME FROM SOCIAL MEDIA DATA
PREDICTING ELECTION OUTCOME FROM SOCIAL MEDIA DATA
BIG DATA IN INDIAN ELECTION.pptx
Twitter Based Outcome Predictions of 2019 Indian General Elections Using Deci...
PREDICTING ELECTION OUTCOME FROM SOCIAL MEDIA DATA
PREDICTING ELECTION OUTCOME FROM SOCIAL MEDIA DATA
PARTICIPATION ANTICIPATING IN ELECTIONS USING DATA MINING METHODS
Election Result Prediction using Twitter Analysis
Towards Human-Centered Machine Learning
Magellan Strategies Colorado Voter Segmentation Overview 052114
IRJET - Election Result Prediction using Sentiment Analysis
Data mining
Transparency in ML and AI (humble views from a concerned academic)
Inferring social media user attributes using language and network information
Pre-Dac-Presentation [Autosaved]ph-d.ppt
Responsible AI in Industry: Practical Challenges and Lessons Learned
Fairness in Machine Learning @Codemotion
Expression of Political Opinions in Press
Interpretable machine learning : Methods for understanding complex models
Don't blindly trust your ML System, it may change your life (Azzurra Ragone, ...

More from DataScienceConferenc1 (20)

PPTX
[DSC Europe 24] Anastasia Shapedko - How Alice, our intelligent personal assi...
PPTX
[DSC Europe 24] Joy Chatterjee - Balancing Personalization and Experimentatio...
PPTX
[DSC Europe 24] Pratul Chakravarty - Personalized Insights and Engagements us...
PPTX
[DSC Europe 24] Domagoj Maric - Modern Web Data Extraction: Techniques, Tools...
PPTX
[DSC Europe 24] Marcin Szymaniuk - The path to Effective Data Migration - Ove...
PPTX
[DSC Europe 24] Fran Mikulicic - Building a Data-Driven Culture: What the C-S...
PPTX
[DSC Europe 24] Sofija Pervulov - Building up the Bosch Semantic Data Lake
PDF
[DSC Europe 24] Dani Ei-Ayyas - Overcoming Loneliness with LLM Dating Assistant
PDF
[DSC Europe 24] Ewelina Kucal & Maciej Dziezyc - How to Encourage Children to...
PPTX
[DSC Europe 24] Nikola Milosevic - VerifAI: Biomedical Generative Question-An...
PPTX
[DSC Europe 24] Josip Saban - Buidling cloud data platforms in enterprises
PPTX
[DSC Europe 24] Sray Agarwal - 2025: year of Ai dilemma - ethics, regulations...
PDF
[DSC Europe 24] Peter Kertys & Maros Buban - Application of AI technologies i...
PPTX
[DSC Europe 24] Orsalia Andreou - Fostering Trust in AI-Driven Finance
PPTX
[DSC Europe 24] Arnault Ioualalen - AI Trustworthiness – A Path Toward Mass A...
PDF
[DSC Europe 24] Nathan Coyle - Open Data for Everybody: Social Action, Peace ...
PPTX
[DSC Europe 24] Miodrag Vladic - Revolutionizing Information Access: All Worl...
PPTX
[DSC Europe 24] Katherine Munro - Where there’s a will, there’s a way: The ma...
PPTX
[DSC Europe 24] Ana Stojkovic Knezevic - How to effectively manage AI/ML proj...
PPTX
[DSC Europe 24] Simun Sunjic & Lovro Matosevic - Empowering Sales with Intell...
[DSC Europe 24] Anastasia Shapedko - How Alice, our intelligent personal assi...
[DSC Europe 24] Joy Chatterjee - Balancing Personalization and Experimentatio...
[DSC Europe 24] Pratul Chakravarty - Personalized Insights and Engagements us...
[DSC Europe 24] Domagoj Maric - Modern Web Data Extraction: Techniques, Tools...
[DSC Europe 24] Marcin Szymaniuk - The path to Effective Data Migration - Ove...
[DSC Europe 24] Fran Mikulicic - Building a Data-Driven Culture: What the C-S...
[DSC Europe 24] Sofija Pervulov - Building up the Bosch Semantic Data Lake
[DSC Europe 24] Dani Ei-Ayyas - Overcoming Loneliness with LLM Dating Assistant
[DSC Europe 24] Ewelina Kucal & Maciej Dziezyc - How to Encourage Children to...
[DSC Europe 24] Nikola Milosevic - VerifAI: Biomedical Generative Question-An...
[DSC Europe 24] Josip Saban - Buidling cloud data platforms in enterprises
[DSC Europe 24] Sray Agarwal - 2025: year of Ai dilemma - ethics, regulations...
[DSC Europe 24] Peter Kertys & Maros Buban - Application of AI technologies i...
[DSC Europe 24] Orsalia Andreou - Fostering Trust in AI-Driven Finance
[DSC Europe 24] Arnault Ioualalen - AI Trustworthiness – A Path Toward Mass A...
[DSC Europe 24] Nathan Coyle - Open Data for Everybody: Social Action, Peace ...
[DSC Europe 24] Miodrag Vladic - Revolutionizing Information Access: All Worl...
[DSC Europe 24] Katherine Munro - Where there’s a will, there’s a way: The ma...
[DSC Europe 24] Ana Stojkovic Knezevic - How to effectively manage AI/ML proj...
[DSC Europe 24] Simun Sunjic & Lovro Matosevic - Empowering Sales with Intell...

Recently uploaded (20)

PPTX
New ISO 27001_2022 standard and the changes
DOCX
Factor Analysis Word Document Presentation
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Transcultural that can help you someday.
PPTX
CYBER SECURITY the Next Warefare Tactics
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Introduction to the R Programming Language
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
Database Infoormation System (DBIS).pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Managing Community Partner Relationships
PPTX
A Complete Guide to Streamlining Business Processes
New ISO 27001_2022 standard and the changes
Factor Analysis Word Document Presentation
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
importance of Data-Visualization-in-Data-Science. for mba studnts
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Transcultural that can help you someday.
CYBER SECURITY the Next Warefare Tactics
[EN] Industrial Machine Downtime Prediction
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
ISS -ESG Data flows What is ESG and HowHow
Introduction to the R Programming Language
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Database Infoormation System (DBIS).pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Managing Community Partner Relationships
A Complete Guide to Streamlining Business Processes

[DSC Europe 23] Alen Kisic - How can do Facebook data and machine learning algorithms predict political elections?

  • 1. How can Facebook data and machine learning algorithms predict political elections?
  • 2. Introduction ◉ Political elections are the daily routine of every democratic society and most European countries have an average of one election every year (local, parliamentary, presidential, EU parliament ) . ◉ The eternal question: predicting the results. ○ everyone is interested: citizens, political actors who, create strategies and activities . 2
  • 3. Introduction ◉ Development of social networks: powerful tool for communication with voters! ○ Can we develop predictive model based on candidates activity on the social network Facebook that is comparable to traditional models based on polls? 3
  • 4. Related work ◉ Review of the literature identified lack of previous research: ◉ a model for measuring the effectiveness of political campaigns on social networks has not been developed: variables that are suitable for evaluation in one specific domain are not suitable for evaluation in another domain, ◉ in most research, data was collected only on Twitter , ◉ most of the previous works do not have a component of predictive analytics, ◉ no models have been developed that would determine to what extent activity on the social network Facebook contributes to predicting the outcome of the election. 4
  • 5. Related work ◉ Review of the literature identified lack of previous research: ◉ Various models are based on the number of friends or followers on social networks: the conclusions are generalized based on too few variables that do not include the interaction between the candidate and the potential voter. ◉ Most models are based on sentiment analysis - not sufficient to determine the impact on the outcome of the election. ◉ Current methods are not sufficient - necessity of machine learning. ◉ Many models are not empirically confirmed. 5
  • 6. Research objectives ◉ To determine the predictive power of models based on the social network Facebook and compare it with other types of research . ◉ To determine which variables are the most significant predictors of the outcome of the election . ◉ To determine the significance of the temporal data component of the Facebook social network . ◉ To argue which of the four machine learning methods provides the most accurate predictive models of local election outcomes. 6
  • 7. Research flow Predictive models development based on Facebook data and four machine learning algorithms Data collection and preparation: Facebook data Model evaluation Comparison of models with opinion polls data 7
  • 8. Research methodology CRISP DM 8 EVALUATION DEPLOYMENT PROBLEM UNDERSTANDING DATA PREPARATION DATA UNDERSTANDING MODELLING
  • 9. Research methodology – CRISP DM 9 EVALUATION DEPLOYMENT PROBLEM UNDERSTANDING DATA PREPARATION DATA UNDERSTANDING MODELLING Research goals: - determine the predictive power of election outcome models based on the social network Facebook and compare it with other types of research, - determine which variables are the most significant predictors of the outcome of the election.
  • 10. Research methodology – CRISP DM 10 EVALUATION DEPLOYMENT PROBLEM UNDERSTANDING DATA PREPARATION DATA UNDERSTANDING MODELLING France local elections: all cities with more than 100 000 habitants (41 city), 225 candidates – examination and downloading data from their Facebook pages Number of events, photos, links, videos, statuses – their comments, shares and likes.
  • 11. Research methodology – CRISP DM 11 EVALUATION DEPLOYMENT PROBLEM UNDERSTANDING DATA PREPARATION DATA UNDERSTANDING MODELLING 25 variables included: - 24 input variables: the candidate's activity and the reactions of the page followers to the candidate's activity, two variables related to the gender of the candidate and the affiliation of the candidate to a political party. - one output variable: the result in the elections measured by the percentage of votes of the candidates in the elections.
  • 12. Research methodology – CRISP DM 12 EVALUATION DEPLOYMENT PROBLEM UNDERSTANDING DATA PREPARATION DATA UNDERSTANDING MODELLING Why Facebook? Chen and Chang (2017), claim that political campaigns are increasingly "fleeing" to Facebook - social network with the largest number of users. Facebook is the most used source of political news for Millennials and Generation X (ages 18 to 51) (Pew Research Center, 2015). France - 70% of the population of those who use social networks is on the Facebook (Chaffey, 2019). Singh and colleagues (2020), who conducted the research only on Twitter, stated the need to include other social networks.
  • 13. Research methodology – CRISP DM 13 EVALUATION DEPLOYMENT PROBLEM UNDERSTANDING DATA PREPARATION DATA UNDERSTANDING MODELLING Attributes taken into account are: city and turnout in the city, party, total number of page likes, number of photos, number of statuses (text), number of links, number of videos, number of events created, number of photo shares, number of status shares, number of link shares, number of video shares, number of likes photo, number of status likes, number of link likes, number of video likes, number of photo comments, number of status comments, number of link comments, number of video comments, publication time. Transformation into relative sizes with respect to the number of voters.
  • 14. Research methodology – CRISP DM 14 EVALUATION DEPLOYMENT PROBLEM UNDERSTANDING DATA PREPARATION DATA UNDERSTANDING MODELLING Data description through descriptive statistics . Most variables have an exponential distribution of values: - characterized by a high probability of occurrence of smaller values, and a low probability of occurrence of large values. - values of the arithmetic mean are higher than the values of the median.
  • 15. Research methodology – CRISP DM 15 EVALUATION DEPLOYMENT PROBLEM UNDERSTANDING DATA PREPARATION DATA UNDERSTANDING MODELLING Correlation analysis - the highest correlation between the number of event sharing and number of event likes , - correlation is positive, which indicates that with the increase in event likes, the number of event shares also increases, and vice versa . - variable Result - has the highest linear relationship with the variable Total number of page likes
  • 16. Research methodology – CRISP DM 16 EVALUATION DEPLOYMENT PROBLEM UNDERSTANDING DATA PREPARATION DATA UNDERSTANDING MODELLING Outliers - ML algorithms are sensitive to extreme values. - Interquartiles were used to identify outliers. - Missing values - Imputation was performed by inserting data instead of missing values . - missing values were replaced by the mean values of the variables.
  • 17. Research methodology – CRISP DM 17 EVALUATION DEPLOYMENT PROBLEM UNDERSTANDING DATA PREPARATION DATA UNDERSTANDING MODELLING Data normalization - data were normalized with respect to the number of voters in each city. The original value of each variable is divided by the number of voters in the city to which it refers. - Min- max normalization - all variables to a scale from 0 to 1.
  • 18. Research methodology – CRISP DM 18 EVALUATION DEPLOYMENT PROBLEM UNDERSTANDING DATA PREPARATION DATA UNDERSTANDING MODELLING - Predictive models were developed using four machine learning algorithms: - optimization of hyperparameters was carried out of each individual algorithm in order to prevent overtraining of the model, and to obtain high-quality, reliable and accurate predictive models.
  • 19. Research methodology – CRISP DM 19 EVALUATION DEPLOYMENT PROBLEM UNDERSTANDING DATA PREPARATION DATA UNDERSTANDING MODELLING Four types of machine learning algorithms - Machine learning based on information (Classification and regression trees). - Machine learning based on similarity (k-nearest neighbors), - Machine learning based on probability (Bayesian networks), - Error-based machine learning (artificial neural networks).
  • 20. Models comparison Model Error K-NN 8.9889 Neural network 5.9779712 Decision tree 6.3441513 Naïve Bayes 12.4532 20
  • 21. Models comparison Model Reliability K-NN 0.5044 Neural network 0.8093 Decision tree 0.753 Naïve Bayes -0.243 21
  • 22. Model comparison ◉ In order to compare the predictive models obtained by machine learning algorithms and the results of pre-election polls, data provided by the French Institute for Public Opinion Research, IFOP, was used. ◉ The data refer to 6 cities: Paris, Lyon , Marseille , Rennes , Nantes and Bordeaux , thus including data for a total of 44 candidates in the elections. ◉ There is difference in error of predictive models obtained by machine learning algorithms and polling! 22
  • 23. Research methodology – CRISP DM 23 EVALUATION DEPLOYMENT PROBLEM UNDERSTANDING DATA PREPARATION DATA UNDERSTANDING MODELLING Determination of the most significant predictors of election outcomes based on the obtained predictive models. Sensitivity analysis of each predictive model was performed.
  • 24. Prediction of results N kNN rank ANN rank Bayes rank Decision tree rank Average rank Overall page likes 1 1 1 1 1 Number of events 6 7 3 4 1 Number of photos 5 8 5 9 6 Number of links 3 9 7 7 6.5 Number of statuses 4 3 9 3 4.75 24
  • 25. Prediction of results N kNN rank ANN rank Bayes rank Decision tree rank Average rank Number of photo likes 8 6 8 6 7 Number of link likes 2 2 2 2 2 Number of status likes 7 5 6 10 7 Gender 9 10 10 8 9.25 Political party 10 4 4 5 5.75 25
  • 26. Results discussion ◉ Stability and consistency of the models. ◉ The total number of likes on the candidate's page is the most significant predictor in all models. ◉ The number of link likes is the second most significant predictor also in all four models. ◉ The number of statuses is the third strongest predictor of election results in two of the four observed models. 26
  • 27. Results discussion ◉ The best quality models are obtained by applying the artificial neural network algorithm. ◉ Explanation lies in the data types : dummy variables are mostly numeric continuous attributes . ◉ The models with the lowest reliability and accuracy values were obtained by the Naive Bayesian classifier ◉ Naive Bayesian classifier works with categorical variables, so it is necessary to transform the continuous output into a categorical one . 27
  • 28. Results discussion ◉ Social media as an indicator for predicting election outcomes ◉ the absolute number of Facebook followers is a very good predictor of election outcomes ◉ the content that the candidates place via social media, as well as the reactions to the specific content that they share, is to a certain extent an indicator of the outcome of the election ◉ The results of the research lead to a better understanding of the way in which social networks present the views of voters, and indicate how the electorate can be influenced through social networks. 28
  • 29. Guidelines ◉ How to run campaign on Facebook? ◉ Achieving visibility on Facebook - to get a large number of page likes or followers (various methods of increasing visibility), which generates a high place among average Facebook users in the news feed and there is a greater transmission of messages and an indirect influence on the voters. ◉ To present content from ordinary life that gives greater visibility than classic political messages that are transfer through other communication channels 29
  • 30. Research limitations ◉ Only one country, France, and one elections were included. ◉ Each country has its own specificities that should be taken into account when conducting research. ◉ The opinion of the user base of one social network, Facebook, is not representative of the entire population, but it contributes to the phenomenon of explaining the results. ◉ The data refer to a very specific time, the beginning of March 2020 - the beginning of the spread of the COVID-19 virus and the lockdown . ◉ Four machine learning algorithms were used - number of algorithms left out. 30
  • 31. Research contrubtions ◉ knowledge about models based on data from public opinion polls and social networks is systematized ◉ predictive models of election outcomes based on social network data were developed and evaluated and compared with data obtained from surveys ◉ the predictive power of the variables of the social network Facebook was established ◉ guidelines for using machine learning algorithms on social media data are developed. 31
  • 32. “ "The whole art of politics consists in the rational management of human irrationalities." Karl Paul Reinhold Niebuhr 32
  • 33. Any questions ? You can find me at ◉ kisic.alen@gmail.com Thanks! 33