SlideShare a Scribd company logo
Data Mining
● "Data mining" refers to any set of software
tools or automated processes that can
access data from one or more databases,
and present them in a way that highlights
previously unknown relationships and
patterns.
● Data mining itself is not new. It has been
used for years in academic, government, and
market research in areas such as risk
Who will Mine the Election Data?
The answer is explicit ,
● Election commission
● Political parties
● Media
Where does Data Mining
technology enter in?
● There have always been voter roles, information
stored in databases and you could always break
down by census tract or by community or by city.
● The technology is now allowing parties to do
something which it has never been able to do
● take incompatible data types from all these different databases
and actually put data together so that now they have a much
more clear view
● who the people are who are the best, most likely targets for the
door to door outreach efforts.
Election commission
Election commission (EC)
● Mine the data base,
● To minimize the number of polling booths.
● As a result of mining the data The Election commission of
India was able to cut down the figures,
state No of booths in 1999 No in 2004
TN 54,847 45,729
● To Improve the security in the polling booths by looking at the
past history, where polling was not peaceful (data base may
contain info about booths where repolling was ordered).
EC (contd.,)
● look for patterns that imply a poor polling
percentage (such as terrorist threats etc.,)
● To find patterns that cause the number of
contestants in a particular constituency to be high
(to minimize expenses).
● If patterns indicate that the number contestants in a
constituency is always maximum, they can try to set
a higher deposit amount for that constituency.
Percentage of polling in assembly
elections in India over the past years
year male female total
1962 63.31 46.63 55.42
1967 66.73 55.48 61.33
1971 60.90 49.11 55.29
1977 65.63 54.91 60.49
1980 62.16 51.22 56.62
1984 68.18 58.60 63.56
1989 66.13 57.32 61.95
1991 61.58 51.35 56.93
1996 62.06 53.41 57.94
1998 63.88 57.88 60.88
1999 63.97 55.64 59.99
Visualizing the results
DATA
FOCUSED DATA SUBSET
PREPROCESSED and FORMATTED
DATA
PREDICITIVE MODELS
KNOWLEDGE
Selection
Preprocessing and
Transformation
Data Mining
Human
Interpretation
The Technology-centric view of the data mining
process
year expenditure
1996 597344100
1998 6662216000
1999 8800000000
2004 13000000000
Selection
Preprocessing and
Transformation
MODEL1
MODEL2
Data
Mining
Human
Interpretation
KNOWLEDG
DATAB
ASE
Data
Political Parties
Politics and Data Mining
● In the political sphere, data mining technologies
are useful in,
○ door-to-door canvassing strategy.
○ helping to map out an efficient and effective
mail.
○ To enable campaigns to customize and
personalize messages down to specific
households with great ease.
How are they doing that?
More particularly, there are 4-5 states that they think
will be decisive in the election because the
electorate is so polarized right now and there are
such a small number of undecided voters out there.
Finding who those voters are really matters a lot.
What Political parties do
with the data ……..??
● In elections to the four state assemblies (MP,Delhi,
chattisgar,Rajasthan) conducted a year back, BJP used
an elaborate data analysis. (now they are ruling in three
states!!!!).
● Data analysis is to target messages to specific groups
based on castes ,age, income and profession .
● Data analysis essential when we look for voting patterns
in prior elections.
● Political parties may follow the Benchmarking process
to improve their results.
4. Set
improvement
goals
2. Identify
the best
performers
3. Collect and
analyze data
to identify
gaps
7. Repeat
evaluations
6.
Evaluate
results
5. Develop and
implement plans to
close gaps
The Benchmarking Process
1. Define
the
domain
Data mining Methodology
● The methodology used today in mining consists of just a
few very important concepts.
They are,
○ Finding a pattern in the data.
○ Validating the predictive models.
What is a Pattern?
● Consider the simple problem of trying to determine
the next number in the following sequence:
1212121….?,because the pattern “12” is found often
enough ,you have some confidence in the predictive
model that says “if 1,then 2 will follow.”
● So Pattern is an event or combination of events in a
database that occurs more often than expected.
What is a Model?
● Model is one that can be successfully applied to
a new data in order to make predictions about
missing values or to make statements about
expected values.
● There may not be crisp dividing line between
pattern and a model (in the number sequence
example, the pattern “12” was also the model),
in general pattern are driven by data, whereas a
model generally reflects a purpose and may not
be driven by data.
Picking the best model
Media group NDA+ Cong+ Others
Sahara 263-278 92-102 171-181
Star 263-275 174-186 86-98
Aaj Tak 248 189 105
Zee 249 117 176
NDTV 230-250 190-205 100-130
Opinion polls conducted by various media groups
Picking the best model (Contd.,)
● But the actual results
were,
NDA+ 185
Cong+ 220
Others 137
Media group NDA+ Cong+ Others
Sahara 263-278 92-102 171-181
Star 263-275 174-186 86-98
Aaj Tak 248 189 105
Zee 249 117 176
NDTV 230-250 190-205 100-130
(The predictive model used by NDTV is some what closer to the actual
results)
Media
● Mine the data
● To predict what will happen, Which party will win.
● To make opinion polls and exit polls effective.
Data Mining of Elections 2004
● Most regional parties lost nearly 95% outside
their states.
● Those who losing deposits (on account of less
than 5% of votes) include,
BJP -27%
Cong -30%
CPM -58%
CPI -85%.
Thank you

More Related Content

PPTX
Forecasting Elections from Voters’ Perceptions
PPT
New Opportunity for Urban Analysis
PDF
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
PPT
Who should be nominated to run in the 2012 U.S. presidential election?
PDF
Tuning data warehouse
PDF
PARTICIPATION ANTICIPATING IN ELECTIONS USING DATA MINING METHODS
PPTX
Building Your Own Data Set - Ajit Phadnis
PPTX
Hidden Potential- Using Data to Raise More Money
Forecasting Elections from Voters’ Perceptions
New Opportunity for Urban Analysis
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
Who should be nominated to run in the 2012 U.S. presidential election?
Tuning data warehouse
PARTICIPATION ANTICIPATING IN ELECTIONS USING DATA MINING METHODS
Building Your Own Data Set - Ajit Phadnis
Hidden Potential- Using Data to Raise More Money

Similar to Data mining (20)

PPTX
**How Big Data is Transforming Elections: Smarter Strategies and Improved Man...
PPTX
Module 2_ Introduction to Data Mining, Data Exploration and Data Pre-processi...
PPTX
BIG DATA IN INDIAN ELECTION.pptx
PDF
Stages in the Decision Making Process.pdf
PPTX
The best stats you have ever seen
PDF
History of Election Surveys in India from Kalyan Chandra
PPTX
Know Your Data in data mining applications
PPTX
algorithmic-decisions, fairness, machine learning, provenance, transparency
PPT
Getting to Know Your Data Some sources from where you can access datasets for...
PPTX
Preprocessing_exploring_and_Visualization.pptx
PDF
#MITXData "The Impending Transformation of Market Research" presented by Micr...
PDF
WPA Predictive Analytics Capabilities
PPT
Chapter 2. Know Your Data.ppt
PDF
PPT
hanjia chapter_2.ppt data mining chapter 2
PPTX
Data 101: A Gentle Introduction
PPT
data mining chapter no 2 concepts and techniques
PPT
02Data.ppt data mining introduction topic
PPT
02Data.ppt 02Data.ppt data mining introduction topic1
PPT
Data mining data characteristics
**How Big Data is Transforming Elections: Smarter Strategies and Improved Man...
Module 2_ Introduction to Data Mining, Data Exploration and Data Pre-processi...
BIG DATA IN INDIAN ELECTION.pptx
Stages in the Decision Making Process.pdf
The best stats you have ever seen
History of Election Surveys in India from Kalyan Chandra
Know Your Data in data mining applications
algorithmic-decisions, fairness, machine learning, provenance, transparency
Getting to Know Your Data Some sources from where you can access datasets for...
Preprocessing_exploring_and_Visualization.pptx
#MITXData "The Impending Transformation of Market Research" presented by Micr...
WPA Predictive Analytics Capabilities
Chapter 2. Know Your Data.ppt
hanjia chapter_2.ppt data mining chapter 2
Data 101: A Gentle Introduction
data mining chapter no 2 concepts and techniques
02Data.ppt data mining introduction topic
02Data.ppt 02Data.ppt data mining introduction topic1
Data mining data characteristics
Ad

Recently uploaded (20)

PDF
Foundation of Data Science unit number two notes
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Lecture1 pattern recognition............
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Global journeys: estimating international migration
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Foundation of Data Science unit number two notes
1_Introduction to advance data techniques.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction-to-Cloud-ComputingFinal.pptx
Fluorescence-microscope_Botany_detailed content
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
.pdf is not working space design for the following data for the following dat...
Lecture1 pattern recognition............
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Supervised vs unsupervised machine learning algorithms
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Reliability_Chapter_ presentation 1221.5784
Global journeys: estimating international migration
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Ad

Data mining

  • 1. Data Mining ● "Data mining" refers to any set of software tools or automated processes that can access data from one or more databases, and present them in a way that highlights previously unknown relationships and patterns. ● Data mining itself is not new. It has been used for years in academic, government, and market research in areas such as risk
  • 2. Who will Mine the Election Data? The answer is explicit , ● Election commission ● Political parties ● Media
  • 3. Where does Data Mining technology enter in? ● There have always been voter roles, information stored in databases and you could always break down by census tract or by community or by city. ● The technology is now allowing parties to do something which it has never been able to do ● take incompatible data types from all these different databases and actually put data together so that now they have a much more clear view ● who the people are who are the best, most likely targets for the door to door outreach efforts.
  • 5. Election commission (EC) ● Mine the data base, ● To minimize the number of polling booths. ● As a result of mining the data The Election commission of India was able to cut down the figures, state No of booths in 1999 No in 2004 TN 54,847 45,729 ● To Improve the security in the polling booths by looking at the past history, where polling was not peaceful (data base may contain info about booths where repolling was ordered).
  • 6. EC (contd.,) ● look for patterns that imply a poor polling percentage (such as terrorist threats etc.,) ● To find patterns that cause the number of contestants in a particular constituency to be high (to minimize expenses). ● If patterns indicate that the number contestants in a constituency is always maximum, they can try to set a higher deposit amount for that constituency.
  • 7. Percentage of polling in assembly elections in India over the past years year male female total 1962 63.31 46.63 55.42 1967 66.73 55.48 61.33 1971 60.90 49.11 55.29 1977 65.63 54.91 60.49 1980 62.16 51.22 56.62 1984 68.18 58.60 63.56 1989 66.13 57.32 61.95 1991 61.58 51.35 56.93 1996 62.06 53.41 57.94 1998 63.88 57.88 60.88 1999 63.97 55.64 59.99
  • 9. DATA FOCUSED DATA SUBSET PREPROCESSED and FORMATTED DATA PREDICITIVE MODELS KNOWLEDGE Selection Preprocessing and Transformation Data Mining Human Interpretation The Technology-centric view of the data mining process
  • 10. year expenditure 1996 597344100 1998 6662216000 1999 8800000000 2004 13000000000 Selection Preprocessing and Transformation MODEL1 MODEL2 Data Mining Human Interpretation KNOWLEDG DATAB ASE Data
  • 12. Politics and Data Mining ● In the political sphere, data mining technologies are useful in, ○ door-to-door canvassing strategy. ○ helping to map out an efficient and effective mail. ○ To enable campaigns to customize and personalize messages down to specific households with great ease.
  • 13. How are they doing that? More particularly, there are 4-5 states that they think will be decisive in the election because the electorate is so polarized right now and there are such a small number of undecided voters out there. Finding who those voters are really matters a lot.
  • 14. What Political parties do with the data ……..?? ● In elections to the four state assemblies (MP,Delhi, chattisgar,Rajasthan) conducted a year back, BJP used an elaborate data analysis. (now they are ruling in three states!!!!). ● Data analysis is to target messages to specific groups based on castes ,age, income and profession . ● Data analysis essential when we look for voting patterns in prior elections. ● Political parties may follow the Benchmarking process to improve their results.
  • 15. 4. Set improvement goals 2. Identify the best performers 3. Collect and analyze data to identify gaps 7. Repeat evaluations 6. Evaluate results 5. Develop and implement plans to close gaps The Benchmarking Process 1. Define the domain
  • 16. Data mining Methodology ● The methodology used today in mining consists of just a few very important concepts. They are, ○ Finding a pattern in the data. ○ Validating the predictive models.
  • 17. What is a Pattern? ● Consider the simple problem of trying to determine the next number in the following sequence: 1212121….?,because the pattern “12” is found often enough ,you have some confidence in the predictive model that says “if 1,then 2 will follow.” ● So Pattern is an event or combination of events in a database that occurs more often than expected.
  • 18. What is a Model? ● Model is one that can be successfully applied to a new data in order to make predictions about missing values or to make statements about expected values. ● There may not be crisp dividing line between pattern and a model (in the number sequence example, the pattern “12” was also the model), in general pattern are driven by data, whereas a model generally reflects a purpose and may not be driven by data.
  • 19. Picking the best model Media group NDA+ Cong+ Others Sahara 263-278 92-102 171-181 Star 263-275 174-186 86-98 Aaj Tak 248 189 105 Zee 249 117 176 NDTV 230-250 190-205 100-130 Opinion polls conducted by various media groups
  • 20. Picking the best model (Contd.,) ● But the actual results were, NDA+ 185 Cong+ 220 Others 137 Media group NDA+ Cong+ Others Sahara 263-278 92-102 171-181 Star 263-275 174-186 86-98 Aaj Tak 248 189 105 Zee 249 117 176 NDTV 230-250 190-205 100-130 (The predictive model used by NDTV is some what closer to the actual results)
  • 21. Media ● Mine the data ● To predict what will happen, Which party will win. ● To make opinion polls and exit polls effective.
  • 22. Data Mining of Elections 2004 ● Most regional parties lost nearly 95% outside their states. ● Those who losing deposits (on account of less than 5% of votes) include, BJP -27% Cong -30% CPM -58% CPI -85%.