SlideShare a Scribd company logo
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 04 | Apr-2015, Available @ http://www.ijret.org 344
A PREDICTIVE MODEL FOR MAPPING CRIME USING BIG DATA
ANALYTICS
Saoumya1
, Anurag Singh Baghel2
1
Gautam Buddha University, Uttar Pradesh, India
2
Gautam Buddha University, Uttar Pradesh, India
Abstract
Crime reduction and prevention challenges in today’s world are becoming increasingly complex and are in need of a new
technique that can handle the vast amount of information that is being generated. Traditional police capabilities mostly fall short
in depicting the original division of criminal activities, thus contribute less in the suitable allocation of police services. In this
paper methods are described for crime event forecasting, using Hadoop, by studying the geographical areas which are at greater
risk and outside the traditional policing limits. The developed method makes the use of a geographical crime mapping algorithm
to identify areas that have relatively high cases of crime. The term used for such places is hot spots. The identified hotspot clusters
give valuable data that can be used to train the artificial neural network which further can model the trends of crime. The
artificial neural network specification and estimation approach is enhanced by processing capability of Hadoop platform.
Keywords— Crime forecasting; Cluster analysis; artificial neural networks; Patrolling; Big data; Hadoop; Gamma
test.
--------------------------------------------------------------------***---------------------------------------------------------------------
1. INTRODUCTION
Police will greatly benefit by a software that will be able to
intelligently analyse a constantly updating database of crime
incidences and its description, providing accurate
predictions of where the crime is most likely occur and at
what time . This will help in optimum police resource
allocation. In doing so, one of the drawback is the fact that
crime occurrences are generally sparse with respect to the
type of crime, time and space at which the incidence occurs,
and the randomness subjected to it. Apart from this the
ability to process unstructured data was limited till now but
with the advent of big data we can explore a new approach
for making predictions. A crime analysis tool must be able
to identify crime patterns accurately and efficiently for
future forecasting and accurate crime pattern. However, in
the present scenario, there are few challenges as followed
1) Increasing size of the information that has to be
stored and analysed.
2) Different techniques that can analyse with accuracy
and efficiency of this increasing volume of data on
crime.
3) Varied methods and infrastructure that are used for
recording data on crime.
4) The available data are inconsistent and incomplete
and are making the task increasingly difficult formal
analysis
5) Due to complex nature, it takes more time
This paper delineate a forecasting framework on the Hadoop
platform that will be able to predict the near likely crimes.
This work vary from other previous studies which basically
describes the hot-spot methods and their statistical
significance. Researchers have mainly focused on mapping
and analysing crime divisions but in this paper the identified
hot-spots clusters are used as the foundation for predictive
algorithm. Thus hotspot visualization aids in crime
prediction. Depending on backdated events it is used to
recognise the areas of high occurrences of crime incidences
so that appropriate police resources can be deployed at the
identified locations. The software also harness the capability
of Hadoop to process large amount of data in half the time
as compared to other systems that have been studied till
now.
The approach presented in this paper have three key stages
as shown in Fig 1. The first is the geographic distribution of
crime data analysis which identifies spatial clusters having
greater risk of crime. In the second step a clustering
algorithm is used to determine the quality of each identified
cluster.
Fig. 1. Predictive Process Model
The final step involves the prediction which deploys an
artificial neural network (ANN) model which is based on
classification and regression tree predictive specification.
The paper basically shows how geographical clusters of
crime data can train artificial neural networks to facilitate
predictive modelling and how the same can be done on the
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 04 | Apr-2015, Available @ http://www.ijret.org 345
Hadoop platform, which renders the capability to process
more data at a fast rate as well as using varied sources of
data. So firstly the data is collected from the past records
and it is mapped. Kernel density estimation is used to make
the clusters from the mapped data. The Gamma Test (GT) is
used estimate how much potential each cluster has to
facilitate prediction. Then the identified clusters are fed into
the artificial neural network which makes prediction about
the future crime. The paper is concluded with a discussion of
the results obtained and scope for future research.
2. CRIME PREDICTION THEORY
Many researchers have tried explaining why crimes occur in
certain areas or is there any pattern that can be concluded
from the past events. One such theory that answers these
questions is the crime prevention theory [1]. According to it
crime does not happen in random fashion, it is either
opportunistic or planned. It states that any criminal activity
occurs when there is intersection of work space of a target
and the offender. The people’s work space is comprised of
places he/she visits in day to day routine, like workplace,
educational institutes, shopping malls, recreational areas etc.
All these specific locations of the offender or victim are also
called nodes. Personal paths, the routes people takes every
day connects with various nodes creating a circumference of
personal space. This personal area is also person’s
awareness space. Thus Crime Pattern Theory states that a
crime involving two people can only occur when the
personal spaces of both intersects at one point or another.
Thus we can say that crimes are not completely random,
they can be studied and analysed to provide likable
predictions. It may not be as accurate as the ones shown in
the movie minority repot though but to some extent
predictions can be made. Fig 2 depicts this phenomenon.
Simply put if an area provides the opportunity to the
offender for crime and it is within the personal awareness
space of the victim then crime will happen. Thus areas that
are secluded and does not have any proper patrolling
provides greater opportunity for crime.
Fig. 2. Activity and Awareness Space of Criminals
Although places like shopping and recreational areas are the
places where the offender and the victim are likely to meet.
The reason being that a large amount of people visit there
places and the offenders can easily mark their potential
victims. The study of human behaviour is outside the scope
of our study as we are only interested in finding a pattern
that can prevent further crimes. Like one of the example
being identification of places where many people fall victim
to chain snatching or pick pocketing. This is mainly
concentrated in certain areas only. Thus crime pattern theory
provides an organized way to proceed in the direction of
prediction exploring the patterns of crime and analysing
them.
3. ARTIFICIAL NEURAL NETWORK FOR
CRIME PREDICTION
Designing prediction models with artificial neural networks
is a well-studied area. Many researchers have contributed in
this field. In one study it is concluded that in the crime level
forecasting methods, the models possess the characteristic of
being autoregressive with input and output are generally
Counts of crime: multiple inputs and a single output [2].
These types of models are used in this paper.
In order to test the artificial neural network it is subjected to
a series of input vectors whose result is known to the tester
but not to the corresponding network. So to determine the
robustness of the training process the tester uses the answers
given by the network, describing the determined level of
crime, provided the input. If the tester feels that the
robustness of the network is sufficient, then the network is
said to have true predictive capabilities and is fit to use.
Thus for the prediction we feed those input to the network
for which we do not know the answers and assume that the
provide output by the network is reliable.
4. HADOOP PLATFORM
The system is built on Apache Hadoop which is an open-
source software framework used for storing and large-scale
processing of data-sets on clusters of hardware. It is used as
the basic platform so that large amount of data can be
processed thus leading to more accurate predictions. The
prediction algorithm is written in the mapper and reduce
program which is processed on the multi cluster
environment resulting in faster results [3].
5. THE CRIME OCCURRENCE DATA
The crime data taken into account in this paper are 100,000
criminal incidents spanning 5 year in area which measures
roughly 457, 23400,040 m2. This research uses crime data
that has happened before a particular date, to create hotspot
cluster maps, and to test its robustness for forecasting when
and where the crime are most likely to happen next. Four
crime types are taken into consideration namely-burglary,
street crime, theft from vehicles and theft of vehicles. The
data sets used in the database of crime incidents are
variables like time, day, month, weather, and location which
are mapped as geographical coordinates [4]. In a
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 04 | Apr-2015, Available @ http://www.ijret.org 346
comparative study undertaken, KDE was the technique that
consistently outshone the other techniques, amongst the
crime types, street crime hotspot maps are generally found
more accurate at predicting where near likely street crime
would occur as compared to others. Hence KDE is used for
the hotspot mapping.
5.1 Training Data from Crime Clusters
For the analysis of crime clusters, Kernel density estimation
(KDE) was used which is stated as the most appropriate
spatial analysis technique for mapping crime data. KDE is
widely approved method that can be attributed to the fact
that it is growing very rapidly and its availability is
unquestionable (one of the example being MapInfo add-on
Hotspot Detective), others factors include the deduced
efficiency of hotspot mapping and the user friendly layout of
the resulting map in comparison to other techniques. Point
data (offences) are aggregated within a user defined search
radius and a continuous surface that represents the volume
or density of crime events across the needed area is
calculated. A map is a smooth surface, which shows the
variation of the density of crime / point across the study area,
without adhering to geometric shapes like circle or ellipse. It
also provides flexibility in configuring various parameters
such as grid size and radius search, however, despite many
useful recommendations, there is no universal doctrine on
how to use these and in what circumstances. The following
steps are followed.
1) Raw data analysis,
2) cluster analysis and geographic representation,
3) allocation of centroids to clusters,
4) allocation of crime incidents to cluster boundaries
5.2 Raw Data Analysis
A simple search algorithm is used which identifies the small
areas that have higher than average incidence crime [5]. The
algorithm uses a counting function that iterates through
crime data incrementing the counter whenever any incidence
is within the scan range. Clusters of the crime incidences are
shown in Fig 3.
Two coordinates C1 and C2 are used which projects the
coordinates of the crime incidence.
If scan radius <√ (C1-X)2
+(C2-Y)2
Where X and Y are coordinates of centroid.
5.3 Cluster Analysis and Geographic
Representation
A heuristic approach is used in this step to identify the count
of crime occurrences required for a cluster to be considered
as salient. The heuristic rules makes use of the fact that
crime incidents are generally clustered within small
geographical areas known as hotspots. A graphical
representation of dispersion of heuristically generated data is
used to increase the radius of the zone associated with that
centroid [6]. As the density of the centroids sample increases,
so does the radius of influence associated with that centroid.
User interaction and experimentation resulted in the radius
of influence, which are set to count * 45, where count is the
number of crimes related to the centre of gravity during the
first stage of analysis. This leads to the identification of
stakeholders.
Fig. 3. Clusters of crime incidences
5.4 Allocation of Centroids to Clusters
Next, the centroids that are to be grouped together to form
clusters are identified. The gravity and density parameters,
along with a centroid list generated in previous stage, forms
the basis for this iterative procedure.
Fig. 4. Crime incidence by cluster
5.5 Allocating Incidents to Cluster Boundaries
Finally, each group is filled with data ready for a series of
training neural networks, one for each group. Each record
contains a unique identifier crime, the cluster to which it
belongs, the population's economic situation and the day of
the week during which the crime was committed. Fig 4
shows the cluster and its corresponding crime count. In
addition, each group record has a unique identifier, a list of
its member centroids and a total count of crime.
6. GAMMA TEST FOR CLUSTER ANALYSIS
Considering the results of cluster analysis, autoregressive
techniques were used to model the data grouped. The
autoregressive model was selected as the preferred method
for the problem of short-term prediction [7]. Below
associated methodology is discussed.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 04 | Apr-2015, Available @ http://www.ijret.org 347
7. FORECASTING USING ARTIFICIAL
NEURAL NETWORK
The implementation of an ANN model requires
consideration of accurate model parameters that affect the
efficiency and stability of the model. These include
decisions concerning the number of input / output nodes and
hidden layers, training algorithm selection, and volume of
data to be used for training and testing.
7.1 The Neural Network Architecture
Tree-structured prediction is used which is becoming
increasingly popular in a vast domain of applications. A tree
is a graph prediction which is associated with the following
statistical model [8].
A characteristic function of the set is used. The ANN has
two inner layers, with the specified activation functions. The
number of neurons in the first layer is same as the number of
nodes in the tree, it makes functions at the nodes such that
each neuron has output 1 or 0 depending on whether or not
the function defining the branching has yes or no answer.
The second layer contains many neurons as there are leaves
in the tree and the output of each neuron is 1 or 0 depending
on whether the subject with predictor x is assigned or not to
the corresponding leaf. The output layer simply realizes
equation. The best topology of nodes in the hidden layer is
also empirically determined. Previous research indicated that
using a single hidden layer is sufficient to learn any complex
nonlinear function suggest that two hidden layers can
produce more efficient architectures.
Initial large number of nodes in the hidden layer was
incrementally reduced to a minimum while maintaining an
acceptable prediction capabilities. The shallow gradient
(shown for the town centre cluster ) suggested that a
relatively few number of hidden nodes compared to 2 N 11
rule would be sufficient to model the underlying function,
and it turned out to be the case. The standard gradient
descent method for adjusting weights are replaced with
conjugate gradient descent using earlier gradient measures to
improve the error minimization process .
7.2 Terminating the Training Procedure
In the field of artificial neural network overfitting is widely
accepted problem but gamma test can suitably measure and
remove the noise from the data thus determining where
exactly the training should be stopped, which is very useful
for the researchers [9]. Overfitting mainly occurs because
the ANN attempts to fit all data fed into the network,
including the noise present. If we know the measure of noise
which is included in the data sets it will be a lot easier to
determine the point at which the training will be stopped,
because the network will try to fit the useful data before the
noise. Thus, the GT statistic G gives the MSE value at which
training needs to be stopped [10].
7.3 Partitioning the Inputs into Test and Training
Sets
First we need to determine the number of input required to
model the output. Once it is calculated, the data needs to fit
into the designed architecture. This architecture is used and
M-test is performed to determine whether the number of
inputs are sufficient to model the discussed algorithm. An
asymptotic level for the Gamma statistic (which
approximates to the inherent noise of the output) points out
that the data is sufficient and establish a point at which we
can segregate the test data and the training data. It is a very
useful technique because it allows the data to be divided into
two sets rather than three. Consequently, Paris, Ware
Wilson, and Jenkins (2002) has demonstrated that there is no
need of the validation data, whose main use is to determine
at what point the training data will result in overfitting thus
allowing the maximum possible use of data efficiently.
Therefore the choice of appropriate quantity of data required
for modelling is given by the M-Test.
Fig. 5. Forecast and Incidence of crime
8. DISCUSSION OF RESULT
The results commensurate the expectations seeing the
gamma test’s output. Importantly, the noise which represents
the exceptional incidence levels was evidently excluded
giving reliable results. The use of the gamma test was
represented by a pre model evaluation technique. The city
centre cluster offered the best predictive model using the
artificial neural network, cluster seven generated poor
models. The incorporation of the statistical techniques
improved the results. Using the Hadoop multi cluster
platform reduced the time of processing. Now further
experiments can be done which can cover a longer period of
time duration as well as incorporate the diverse nature of
data. Fig 5 represents the accuracy of prediction.
9. CONCLUSION AND FUTURE WORK
The paper describes a forecasting framework keeping in
mind the geographical areas of concern that may transcend
traditional policing limits. This paper sheds light upon the
method of developing a practical system which can be used
for an effective operational policing environment which in
turn can decrease the drawbacks of inefficient techniques
and head towards a more dynamic methodology. The
computer created Hadoop procedure will utilise a
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 04 | Apr-2015, Available @ http://www.ijret.org 348
geographically mapped crime occurrences-scanning
algorithm to map the clusters with relatively high levels of
crime which are known as hot spots. The mapped clusters
give sufficient data which is analysed by the gamma
procedure to test the fitness of the data required for
predictive modelling. By using the results obtained from the
gamma test, the Hadoop model based on artificial neural
network is implemented. As the previous study states, the
artificial neural network generally displays a superior ability
to model the trends within each cluster.
Further development will extend to the modelling of more
detailed scenarios to facilitate prediction based on detailed
input crime. Thus, the impact on the area for an upcoming
holiday, where the weather is expected to be warm, could be
evaluated. The aim here was to extract an underlying
generalized model of crime incidents. However, specific
locations best modelled independently of the other data on
specific times of the year. Although there considerations can
be modelled separately if sufficient amount of high quality
data is backing it. As well as many statistical tools can study
the exceptional events that will provide greater insight to
how these events change the levels of crime and ultimately
defining the rules that will modify incidence count
significantly.
REFERENCES
[1] Balkin, S. D., & Ord, J. K. (2000). Automatic neural
network modelling for univariate time series.
International Journal of Forecasting, 16, 509–515.
[2] Woodworth JT, Mohler GO, Bertozzi AL,
Brantingham, “Non-local crime density estimation
incorporating housing information”,.Phil. Trans. R.
Soc. A 372: 20130403, September 2014.
[3] Han hu, Yonggang wen, Tat-seng chua, and Xuelong
li,” Toward Scalable Systems for Big Data Analytics:
A Technology Tutorial”, 1-School of Computing,
National University of Singapore, Singapore 117417,
2014.
[4] Big Data Startup,” The Los Angeles Police
Department Is Predicting and Fighting Crime with
Big Data,” http://www.bigdata-startups.com/BigData-
startup/los-angeles police-department-predicts-fights-
crime-big-data/, May 2013.
[5] Chainey, S., & Reid, S. (2002). When is a hotspot a
hotspot? A procedure for creating statistically robust
hotspot maps of crime. In Kidner, D. B. et al. (Ed.),
Socio-economic applications in geographical
information science. London: Taylor and Francis, pp.
21–36.
[6] D.Usha, Dr.K.Rameshkumar,” A Complete Survey on
application of Frequent Pattern Mining and
Association Rule Mining on Crime Pattern Mining”,
International Journal of Advances in Computer
Science and Technology ISSN 2320 – 2602 Volume 3,
No.4, April 2014.
[7] Hirschfield, A., 2001. Decision support in crime
prevention: data analysis, policy evaluation and GIS.
In: Mapping and Analysing Crime Data - Lessons
from Research and Practice, A., Hirschfield & K,
Bowers (Eds.). Taylor and Francis, 2001, pp. 237-269.
[8] Antonio Ciampi and Yves Lechevallier, Statistical
Models and Artificial Neural Networks: Supervised
Classification and Prediction Via Soft Trees, McGill
University, Montreal, QC, Canada INRIA-
Rocquencourt, Le Chesnay, France
[9] W. Chang et al., “An International Perspective on
Fighting Cybercrime,” Proc. 1st NSF/NIJ Symp.
Intelligence and Security Informatics, LNCS 2665,
Springer-Verlag, 2003, pp. 379-384.
[10] Jonathan J. Corcoran, Ian D. Wilson, J. Andrew
Ware, P redicting the geo-temporal variations of
crime and disorder, International Journal of
Forecasting 19 (2003) 623–634.

More Related Content

PDF
Predictive analysis of crime forecasting
PPT
Using Data Mining Techniques to Analyze Crime Pattern
PDF
Propose Data Mining AR-GA Model to Advance Crime analysis
PDF
SAS Data Mining - Crime Modeling
PPTX
Crime Analytics: Analysis of crimes through news paper articles
PPTX
Crime Pattern Detection using K-Means Clustering
PDF
The International Journal of Engineering and Science (IJES)
PPT
Crime Analysis
Predictive analysis of crime forecasting
Using Data Mining Techniques to Analyze Crime Pattern
Propose Data Mining AR-GA Model to Advance Crime analysis
SAS Data Mining - Crime Modeling
Crime Analytics: Analysis of crimes through news paper articles
Crime Pattern Detection using K-Means Clustering
The International Journal of Engineering and Science (IJES)
Crime Analysis

What's hot (20)

DOCX
Crime analysis mapping, intrusion detection using data mining
PDF
Analytics-Based Crime Prediction
PDF
Crime Analysis & Prediction System
PPT
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
PDF
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
DOCX
Crime rate analysis using k nn in python
PPTX
Crime Analysis using Data Analysis
PDF
Crime Data Analysis, Visualization and Prediction using Data Mining
PDF
IRJET - Crime Analysis and Prediction - by using DBSCAN Algorithm
PPTX
Crime prediction-using-data-mining
PDF
Machine Learning Approaches for Crime Pattern Detection
PDF
Phishing Websites Detection Using Back Propagation Algorithm: A Review
PDF
Crime prediction based on crime types
PDF
IRJET- Crime Analysis using Data Mining and Data Analytics
PPSX
06 analysis of crime
PDF
Workware Systems Situational Awareness & Predictive Policing Overview
PDF
Crime analysis
PPTX
Chicago Crime Dataset Project Proposal
PDF
U24149153
PDF
Probabilistic models for anomaly detection based on usage of network traffic
Crime analysis mapping, intrusion detection using data mining
Analytics-Based Crime Prediction
Crime Analysis & Prediction System
A Comparative Study of Data Mining Methods to Analyzing Libyan National Crime...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Crime rate analysis using k nn in python
Crime Analysis using Data Analysis
Crime Data Analysis, Visualization and Prediction using Data Mining
IRJET - Crime Analysis and Prediction - by using DBSCAN Algorithm
Crime prediction-using-data-mining
Machine Learning Approaches for Crime Pattern Detection
Phishing Websites Detection Using Back Propagation Algorithm: A Review
Crime prediction based on crime types
IRJET- Crime Analysis using Data Mining and Data Analytics
06 analysis of crime
Workware Systems Situational Awareness & Predictive Policing Overview
Crime analysis
Chicago Crime Dataset Project Proposal
U24149153
Probabilistic models for anomaly detection based on usage of network traffic
Ad

Similar to A predictive model for mapping crime using big data analytics (20)

PDF
Crime Analysis based on Historical and Transportation Data
PDF
Predictive Modeling for Topographical Analysis of Crime Rate
PDF
Survey on Crime Interpretation and Forecasting Using Machine Learning
PDF
Bs4301396400
PPTX
PPT.pptx
PDF
IRJET- Cyber Crime Attack Prediction
PDF
Crime Prediction and Analysis
PDF
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNING
PDF
Predictive Modeling for Topographical Analysis of Crime Rate
PPTX
Technical Seminar
PDF
ACCESS.2020.3028420.pdf
PDF
Crime Prediction and Reporting System
PDF
Chicago Crime Analysis
PDF
CRIME EXPLORATION AND FORECAST
PDF
Crime forecasting system for soic final
PDF
A Survey on Data Mining Techniques for Crime Hotspots Prediction
PDF
Analysis & Prediction of Crime of A Region.pdf
PDF
Analysis of Crime Big Data using MapReduce
PPTX
PPTX
9th may net sci presentation (1)
Crime Analysis based on Historical and Transportation Data
Predictive Modeling for Topographical Analysis of Crime Rate
Survey on Crime Interpretation and Forecasting Using Machine Learning
Bs4301396400
PPT.pptx
IRJET- Cyber Crime Attack Prediction
Crime Prediction and Analysis
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNING
Predictive Modeling for Topographical Analysis of Crime Rate
Technical Seminar
ACCESS.2020.3028420.pdf
Crime Prediction and Reporting System
Chicago Crime Analysis
CRIME EXPLORATION AND FORECAST
Crime forecasting system for soic final
A Survey on Data Mining Techniques for Crime Hotspots Prediction
Analysis & Prediction of Crime of A Region.pdf
Analysis of Crime Big Data using MapReduce
9th may net sci presentation (1)
Ad

More from eSAT Journals (20)

PDF
Mechanical properties of hybrid fiber reinforced concrete for pavements
PDF
Material management in construction – a case study
PDF
Managing drought short term strategies in semi arid regions a case study
PDF
Life cycle cost analysis of overlay for an urban road in bangalore
PDF
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
PDF
Laboratory investigation of expansive soil stabilized with natural inorganic ...
PDF
Influence of reinforcement on the behavior of hollow concrete block masonry p...
PDF
Influence of compaction energy on soil stabilized with chemical stabilizer
PDF
Geographical information system (gis) for water resources management
PDF
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
PDF
Factors influencing compressive strength of geopolymer concrete
PDF
Experimental investigation on circular hollow steel columns in filled with li...
PDF
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
PDF
Evaluation of punching shear in flat slabs
PDF
Evaluation of performance of intake tower dam for recent earthquake in india
PDF
Evaluation of operational efficiency of urban road network using travel time ...
PDF
Estimation of surface runoff in nallur amanikere watershed using scs cn method
PDF
Estimation of morphometric parameters and runoff using rs &amp; gis techniques
PDF
Effect of variation of plastic hinge length on the results of non linear anal...
PDF
Effect of use of recycled materials on indirect tensile strength of asphalt c...
Mechanical properties of hybrid fiber reinforced concrete for pavements
Material management in construction – a case study
Managing drought short term strategies in semi arid regions a case study
Life cycle cost analysis of overlay for an urban road in bangalore
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory investigation of expansive soil stabilized with natural inorganic ...
Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of compaction energy on soil stabilized with chemical stabilizer
Geographical information system (gis) for water resources management
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Factors influencing compressive strength of geopolymer concrete
Experimental investigation on circular hollow steel columns in filled with li...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Evaluation of punching shear in flat slabs
Evaluation of performance of intake tower dam for recent earthquake in india
Evaluation of operational efficiency of urban road network using travel time ...
Estimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of morphometric parameters and runoff using rs &amp; gis techniques
Effect of variation of plastic hinge length on the results of non linear anal...
Effect of use of recycled materials on indirect tensile strength of asphalt c...

Recently uploaded (20)

PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
PPT on Performance Review to get promotions
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Construction Project Organization Group 2.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT
Mechanical Engineering MATERIALS Selection
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Lecture Notes Electrical Wiring System Components
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
Internet of Things (IOT) - A guide to understanding
UNIT-1 - COAL BASED THERMAL POWER PLANTS
CYBER-CRIMES AND SECURITY A guide to understanding
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
OOP with Java - Java Introduction (Basics)
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPT on Performance Review to get promotions
Operating System & Kernel Study Guide-1 - converted.pdf
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Construction Project Organization Group 2.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Embodied AI: Ushering in the Next Era of Intelligent Systems
Foundation to blockchain - A guide to Blockchain Tech
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Mechanical Engineering MATERIALS Selection
R24 SURVEYING LAB MANUAL for civil enggi
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Lecture Notes Electrical Wiring System Components
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Internet of Things (IOT) - A guide to understanding

A predictive model for mapping crime using big data analytics

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 04 | Apr-2015, Available @ http://www.ijret.org 344 A PREDICTIVE MODEL FOR MAPPING CRIME USING BIG DATA ANALYTICS Saoumya1 , Anurag Singh Baghel2 1 Gautam Buddha University, Uttar Pradesh, India 2 Gautam Buddha University, Uttar Pradesh, India Abstract Crime reduction and prevention challenges in today’s world are becoming increasingly complex and are in need of a new technique that can handle the vast amount of information that is being generated. Traditional police capabilities mostly fall short in depicting the original division of criminal activities, thus contribute less in the suitable allocation of police services. In this paper methods are described for crime event forecasting, using Hadoop, by studying the geographical areas which are at greater risk and outside the traditional policing limits. The developed method makes the use of a geographical crime mapping algorithm to identify areas that have relatively high cases of crime. The term used for such places is hot spots. The identified hotspot clusters give valuable data that can be used to train the artificial neural network which further can model the trends of crime. The artificial neural network specification and estimation approach is enhanced by processing capability of Hadoop platform. Keywords— Crime forecasting; Cluster analysis; artificial neural networks; Patrolling; Big data; Hadoop; Gamma test. --------------------------------------------------------------------***--------------------------------------------------------------------- 1. INTRODUCTION Police will greatly benefit by a software that will be able to intelligently analyse a constantly updating database of crime incidences and its description, providing accurate predictions of where the crime is most likely occur and at what time . This will help in optimum police resource allocation. In doing so, one of the drawback is the fact that crime occurrences are generally sparse with respect to the type of crime, time and space at which the incidence occurs, and the randomness subjected to it. Apart from this the ability to process unstructured data was limited till now but with the advent of big data we can explore a new approach for making predictions. A crime analysis tool must be able to identify crime patterns accurately and efficiently for future forecasting and accurate crime pattern. However, in the present scenario, there are few challenges as followed 1) Increasing size of the information that has to be stored and analysed. 2) Different techniques that can analyse with accuracy and efficiency of this increasing volume of data on crime. 3) Varied methods and infrastructure that are used for recording data on crime. 4) The available data are inconsistent and incomplete and are making the task increasingly difficult formal analysis 5) Due to complex nature, it takes more time This paper delineate a forecasting framework on the Hadoop platform that will be able to predict the near likely crimes. This work vary from other previous studies which basically describes the hot-spot methods and their statistical significance. Researchers have mainly focused on mapping and analysing crime divisions but in this paper the identified hot-spots clusters are used as the foundation for predictive algorithm. Thus hotspot visualization aids in crime prediction. Depending on backdated events it is used to recognise the areas of high occurrences of crime incidences so that appropriate police resources can be deployed at the identified locations. The software also harness the capability of Hadoop to process large amount of data in half the time as compared to other systems that have been studied till now. The approach presented in this paper have three key stages as shown in Fig 1. The first is the geographic distribution of crime data analysis which identifies spatial clusters having greater risk of crime. In the second step a clustering algorithm is used to determine the quality of each identified cluster. Fig. 1. Predictive Process Model The final step involves the prediction which deploys an artificial neural network (ANN) model which is based on classification and regression tree predictive specification. The paper basically shows how geographical clusters of crime data can train artificial neural networks to facilitate predictive modelling and how the same can be done on the
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 04 | Apr-2015, Available @ http://www.ijret.org 345 Hadoop platform, which renders the capability to process more data at a fast rate as well as using varied sources of data. So firstly the data is collected from the past records and it is mapped. Kernel density estimation is used to make the clusters from the mapped data. The Gamma Test (GT) is used estimate how much potential each cluster has to facilitate prediction. Then the identified clusters are fed into the artificial neural network which makes prediction about the future crime. The paper is concluded with a discussion of the results obtained and scope for future research. 2. CRIME PREDICTION THEORY Many researchers have tried explaining why crimes occur in certain areas or is there any pattern that can be concluded from the past events. One such theory that answers these questions is the crime prevention theory [1]. According to it crime does not happen in random fashion, it is either opportunistic or planned. It states that any criminal activity occurs when there is intersection of work space of a target and the offender. The people’s work space is comprised of places he/she visits in day to day routine, like workplace, educational institutes, shopping malls, recreational areas etc. All these specific locations of the offender or victim are also called nodes. Personal paths, the routes people takes every day connects with various nodes creating a circumference of personal space. This personal area is also person’s awareness space. Thus Crime Pattern Theory states that a crime involving two people can only occur when the personal spaces of both intersects at one point or another. Thus we can say that crimes are not completely random, they can be studied and analysed to provide likable predictions. It may not be as accurate as the ones shown in the movie minority repot though but to some extent predictions can be made. Fig 2 depicts this phenomenon. Simply put if an area provides the opportunity to the offender for crime and it is within the personal awareness space of the victim then crime will happen. Thus areas that are secluded and does not have any proper patrolling provides greater opportunity for crime. Fig. 2. Activity and Awareness Space of Criminals Although places like shopping and recreational areas are the places where the offender and the victim are likely to meet. The reason being that a large amount of people visit there places and the offenders can easily mark their potential victims. The study of human behaviour is outside the scope of our study as we are only interested in finding a pattern that can prevent further crimes. Like one of the example being identification of places where many people fall victim to chain snatching or pick pocketing. This is mainly concentrated in certain areas only. Thus crime pattern theory provides an organized way to proceed in the direction of prediction exploring the patterns of crime and analysing them. 3. ARTIFICIAL NEURAL NETWORK FOR CRIME PREDICTION Designing prediction models with artificial neural networks is a well-studied area. Many researchers have contributed in this field. In one study it is concluded that in the crime level forecasting methods, the models possess the characteristic of being autoregressive with input and output are generally Counts of crime: multiple inputs and a single output [2]. These types of models are used in this paper. In order to test the artificial neural network it is subjected to a series of input vectors whose result is known to the tester but not to the corresponding network. So to determine the robustness of the training process the tester uses the answers given by the network, describing the determined level of crime, provided the input. If the tester feels that the robustness of the network is sufficient, then the network is said to have true predictive capabilities and is fit to use. Thus for the prediction we feed those input to the network for which we do not know the answers and assume that the provide output by the network is reliable. 4. HADOOP PLATFORM The system is built on Apache Hadoop which is an open- source software framework used for storing and large-scale processing of data-sets on clusters of hardware. It is used as the basic platform so that large amount of data can be processed thus leading to more accurate predictions. The prediction algorithm is written in the mapper and reduce program which is processed on the multi cluster environment resulting in faster results [3]. 5. THE CRIME OCCURRENCE DATA The crime data taken into account in this paper are 100,000 criminal incidents spanning 5 year in area which measures roughly 457, 23400,040 m2. This research uses crime data that has happened before a particular date, to create hotspot cluster maps, and to test its robustness for forecasting when and where the crime are most likely to happen next. Four crime types are taken into consideration namely-burglary, street crime, theft from vehicles and theft of vehicles. The data sets used in the database of crime incidents are variables like time, day, month, weather, and location which are mapped as geographical coordinates [4]. In a
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 04 | Apr-2015, Available @ http://www.ijret.org 346 comparative study undertaken, KDE was the technique that consistently outshone the other techniques, amongst the crime types, street crime hotspot maps are generally found more accurate at predicting where near likely street crime would occur as compared to others. Hence KDE is used for the hotspot mapping. 5.1 Training Data from Crime Clusters For the analysis of crime clusters, Kernel density estimation (KDE) was used which is stated as the most appropriate spatial analysis technique for mapping crime data. KDE is widely approved method that can be attributed to the fact that it is growing very rapidly and its availability is unquestionable (one of the example being MapInfo add-on Hotspot Detective), others factors include the deduced efficiency of hotspot mapping and the user friendly layout of the resulting map in comparison to other techniques. Point data (offences) are aggregated within a user defined search radius and a continuous surface that represents the volume or density of crime events across the needed area is calculated. A map is a smooth surface, which shows the variation of the density of crime / point across the study area, without adhering to geometric shapes like circle or ellipse. It also provides flexibility in configuring various parameters such as grid size and radius search, however, despite many useful recommendations, there is no universal doctrine on how to use these and in what circumstances. The following steps are followed. 1) Raw data analysis, 2) cluster analysis and geographic representation, 3) allocation of centroids to clusters, 4) allocation of crime incidents to cluster boundaries 5.2 Raw Data Analysis A simple search algorithm is used which identifies the small areas that have higher than average incidence crime [5]. The algorithm uses a counting function that iterates through crime data incrementing the counter whenever any incidence is within the scan range. Clusters of the crime incidences are shown in Fig 3. Two coordinates C1 and C2 are used which projects the coordinates of the crime incidence. If scan radius <√ (C1-X)2 +(C2-Y)2 Where X and Y are coordinates of centroid. 5.3 Cluster Analysis and Geographic Representation A heuristic approach is used in this step to identify the count of crime occurrences required for a cluster to be considered as salient. The heuristic rules makes use of the fact that crime incidents are generally clustered within small geographical areas known as hotspots. A graphical representation of dispersion of heuristically generated data is used to increase the radius of the zone associated with that centroid [6]. As the density of the centroids sample increases, so does the radius of influence associated with that centroid. User interaction and experimentation resulted in the radius of influence, which are set to count * 45, where count is the number of crimes related to the centre of gravity during the first stage of analysis. This leads to the identification of stakeholders. Fig. 3. Clusters of crime incidences 5.4 Allocation of Centroids to Clusters Next, the centroids that are to be grouped together to form clusters are identified. The gravity and density parameters, along with a centroid list generated in previous stage, forms the basis for this iterative procedure. Fig. 4. Crime incidence by cluster 5.5 Allocating Incidents to Cluster Boundaries Finally, each group is filled with data ready for a series of training neural networks, one for each group. Each record contains a unique identifier crime, the cluster to which it belongs, the population's economic situation and the day of the week during which the crime was committed. Fig 4 shows the cluster and its corresponding crime count. In addition, each group record has a unique identifier, a list of its member centroids and a total count of crime. 6. GAMMA TEST FOR CLUSTER ANALYSIS Considering the results of cluster analysis, autoregressive techniques were used to model the data grouped. The autoregressive model was selected as the preferred method for the problem of short-term prediction [7]. Below associated methodology is discussed.
  • 4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 04 | Apr-2015, Available @ http://www.ijret.org 347 7. FORECASTING USING ARTIFICIAL NEURAL NETWORK The implementation of an ANN model requires consideration of accurate model parameters that affect the efficiency and stability of the model. These include decisions concerning the number of input / output nodes and hidden layers, training algorithm selection, and volume of data to be used for training and testing. 7.1 The Neural Network Architecture Tree-structured prediction is used which is becoming increasingly popular in a vast domain of applications. A tree is a graph prediction which is associated with the following statistical model [8]. A characteristic function of the set is used. The ANN has two inner layers, with the specified activation functions. The number of neurons in the first layer is same as the number of nodes in the tree, it makes functions at the nodes such that each neuron has output 1 or 0 depending on whether or not the function defining the branching has yes or no answer. The second layer contains many neurons as there are leaves in the tree and the output of each neuron is 1 or 0 depending on whether the subject with predictor x is assigned or not to the corresponding leaf. The output layer simply realizes equation. The best topology of nodes in the hidden layer is also empirically determined. Previous research indicated that using a single hidden layer is sufficient to learn any complex nonlinear function suggest that two hidden layers can produce more efficient architectures. Initial large number of nodes in the hidden layer was incrementally reduced to a minimum while maintaining an acceptable prediction capabilities. The shallow gradient (shown for the town centre cluster ) suggested that a relatively few number of hidden nodes compared to 2 N 11 rule would be sufficient to model the underlying function, and it turned out to be the case. The standard gradient descent method for adjusting weights are replaced with conjugate gradient descent using earlier gradient measures to improve the error minimization process . 7.2 Terminating the Training Procedure In the field of artificial neural network overfitting is widely accepted problem but gamma test can suitably measure and remove the noise from the data thus determining where exactly the training should be stopped, which is very useful for the researchers [9]. Overfitting mainly occurs because the ANN attempts to fit all data fed into the network, including the noise present. If we know the measure of noise which is included in the data sets it will be a lot easier to determine the point at which the training will be stopped, because the network will try to fit the useful data before the noise. Thus, the GT statistic G gives the MSE value at which training needs to be stopped [10]. 7.3 Partitioning the Inputs into Test and Training Sets First we need to determine the number of input required to model the output. Once it is calculated, the data needs to fit into the designed architecture. This architecture is used and M-test is performed to determine whether the number of inputs are sufficient to model the discussed algorithm. An asymptotic level for the Gamma statistic (which approximates to the inherent noise of the output) points out that the data is sufficient and establish a point at which we can segregate the test data and the training data. It is a very useful technique because it allows the data to be divided into two sets rather than three. Consequently, Paris, Ware Wilson, and Jenkins (2002) has demonstrated that there is no need of the validation data, whose main use is to determine at what point the training data will result in overfitting thus allowing the maximum possible use of data efficiently. Therefore the choice of appropriate quantity of data required for modelling is given by the M-Test. Fig. 5. Forecast and Incidence of crime 8. DISCUSSION OF RESULT The results commensurate the expectations seeing the gamma test’s output. Importantly, the noise which represents the exceptional incidence levels was evidently excluded giving reliable results. The use of the gamma test was represented by a pre model evaluation technique. The city centre cluster offered the best predictive model using the artificial neural network, cluster seven generated poor models. The incorporation of the statistical techniques improved the results. Using the Hadoop multi cluster platform reduced the time of processing. Now further experiments can be done which can cover a longer period of time duration as well as incorporate the diverse nature of data. Fig 5 represents the accuracy of prediction. 9. CONCLUSION AND FUTURE WORK The paper describes a forecasting framework keeping in mind the geographical areas of concern that may transcend traditional policing limits. This paper sheds light upon the method of developing a practical system which can be used for an effective operational policing environment which in turn can decrease the drawbacks of inefficient techniques and head towards a more dynamic methodology. The computer created Hadoop procedure will utilise a
  • 5. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 04 | Apr-2015, Available @ http://www.ijret.org 348 geographically mapped crime occurrences-scanning algorithm to map the clusters with relatively high levels of crime which are known as hot spots. The mapped clusters give sufficient data which is analysed by the gamma procedure to test the fitness of the data required for predictive modelling. By using the results obtained from the gamma test, the Hadoop model based on artificial neural network is implemented. As the previous study states, the artificial neural network generally displays a superior ability to model the trends within each cluster. Further development will extend to the modelling of more detailed scenarios to facilitate prediction based on detailed input crime. Thus, the impact on the area for an upcoming holiday, where the weather is expected to be warm, could be evaluated. The aim here was to extract an underlying generalized model of crime incidents. However, specific locations best modelled independently of the other data on specific times of the year. Although there considerations can be modelled separately if sufficient amount of high quality data is backing it. As well as many statistical tools can study the exceptional events that will provide greater insight to how these events change the levels of crime and ultimately defining the rules that will modify incidence count significantly. REFERENCES [1] Balkin, S. D., & Ord, J. K. (2000). Automatic neural network modelling for univariate time series. International Journal of Forecasting, 16, 509–515. [2] Woodworth JT, Mohler GO, Bertozzi AL, Brantingham, “Non-local crime density estimation incorporating housing information”,.Phil. Trans. R. Soc. A 372: 20130403, September 2014. [3] Han hu, Yonggang wen, Tat-seng chua, and Xuelong li,” Toward Scalable Systems for Big Data Analytics: A Technology Tutorial”, 1-School of Computing, National University of Singapore, Singapore 117417, 2014. [4] Big Data Startup,” The Los Angeles Police Department Is Predicting and Fighting Crime with Big Data,” http://www.bigdata-startups.com/BigData- startup/los-angeles police-department-predicts-fights- crime-big-data/, May 2013. [5] Chainey, S., & Reid, S. (2002). When is a hotspot a hotspot? A procedure for creating statistically robust hotspot maps of crime. In Kidner, D. B. et al. (Ed.), Socio-economic applications in geographical information science. London: Taylor and Francis, pp. 21–36. [6] D.Usha, Dr.K.Rameshkumar,” A Complete Survey on application of Frequent Pattern Mining and Association Rule Mining on Crime Pattern Mining”, International Journal of Advances in Computer Science and Technology ISSN 2320 – 2602 Volume 3, No.4, April 2014. [7] Hirschfield, A., 2001. Decision support in crime prevention: data analysis, policy evaluation and GIS. In: Mapping and Analysing Crime Data - Lessons from Research and Practice, A., Hirschfield & K, Bowers (Eds.). Taylor and Francis, 2001, pp. 237-269. [8] Antonio Ciampi and Yves Lechevallier, Statistical Models and Artificial Neural Networks: Supervised Classification and Prediction Via Soft Trees, McGill University, Montreal, QC, Canada INRIA- Rocquencourt, Le Chesnay, France [9] W. Chang et al., “An International Perspective on Fighting Cybercrime,” Proc. 1st NSF/NIJ Symp. Intelligence and Security Informatics, LNCS 2665, Springer-Verlag, 2003, pp. 379-384. [10] Jonathan J. Corcoran, Ian D. Wilson, J. Andrew Ware, P redicting the geo-temporal variations of crime and disorder, International Journal of Forecasting 19 (2003) 623–634.