A proposed model_for_cybercrime_detectio

A Proposed Model for Cybercrime Detection
Algorithm Using A Big Data Analytics
Hossam Abdel Rahaman
Dept of Computer and Information Sciences
Faculty of Statistical Studies and Research
Cairo University, Egypt
Hossam_mm7@yahoo.com
Abstract— Cybercrime today is evolving as part of our day-to-
day lives, and The challenges of cybercrime reduction and
prevention are becoming increasingly complex, that needs a
new technique to handle the vast amount of this data, The
capabilities of the traditional activities of police mostly drop
brief in portraying the original division of criminal activities,
hence contribute less to the appropriate allotment of police
services. In this paper, methods are described for cybercrime
Prediction, by using the Hadoop technique for big data
analytics, through examining the geological zones which
incorporate more noteworthy chance and exterior the
conventional policing capabilities. The used method makes the
utilize of a topographical cybercrime mapping algorithm to
distinguish regions that have generally high cases of
cybercrime. This method will identify exceedingly cases of
cybercrime clusters which assist can show the patterns of
cybercrime. the estimation approach is enhanced by the
processing capability of the Hadoop platform.
Keywords-component; formatting; style; styling; insert (key
words)
I. INTRODUCTION
Cybercrimes are getting increased with expanding
dangers through online fraud and unscrupulous hacking.
With both cyber safety threats and data increasing, the
organizations must be prepared to prepare themselves with
foreseeing and anticipating cybercrime. the specialists of
cybercrime are using digital Forensic tools to identify
cybercrime episodes and recognize any potential threats like
credit card frauds. Big data analytics is empowering
companies to analyze the gigantic sum of information they
collect amid the monetary transactions; cybercrime could be
a greater significance nowadays due to the increased risk of
cybercrime. Big data tools are being utilized to combat
cybercrime attacks. big data analytics can offer to detect
forgery and can facilitate digital forensic analysis. [1]
The utilize of K-Mean algorithms to analyze the data and
predict where cybercrime is likely to happen is getting to be
more common in law authorization. Frequently referred to as
predictive analysis, which gets to be the police agency's
successes to cybercrime reduction efforts by applying the
predictive investigation. [2]
The detection algorithm presented in this paper has three
stages as appeared in Figure 1. The first phase is the
distribution geographic of cybercrime data analysis which
identifies spatial clusters that have a greater risk of
cybercrime. In the second phase a K-Mean clustering
algorithm that utilized to determine the quality of each
identified cluster. [3]
Figure 1. Predictive Process
This paper delineates a cybercrime detection algorithm
on the Hadoop platform in big data analytics that will be able
to predict the near likely cybercrime. also, a brief overview is
made about several techniques utilized in analyzing big data
to detect online fraud and unethical hacking by analyzing
large sets of data. One aim of this study is to identify the
model that best identifies online fraud cases. [4]
A. Problem Statement
The predictive of big data analysis has not been broadly
examined and studied from an objective, perspective
scientific. Whereas beginning experiences by the police
agencies that have either fully implemented or experimented
with predictive policing techniques appear to be positive,
predictive policing’s affecting on cybercrime has yet to be
definitively determined. this problem is troublesome because
the utilize of predictive analysis in policing is so modern that
small objective research has been conducted on its
cybercrime reduction applications. [5]
B. Challenges Of Research
• The distinctive techniques and infrastructures that
are used for recording data on cybercrime.
• The diverse techniques that can analyze with
precision and efficiency for this expanding volume
of data on cybercrime.
• The accessible data are inconsistent and fragmented
are making the task increasingly difficult formal
analysis
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 18, No. 6, June 2020
146 https://sites.google.com/site/ijcsis/
ISSN 1947-5500

• Increasing the size of the data that has to be stored
and analyzed.
C. Research Questions
The research questions as following:
• Dose the predictive big data analytics an effective
cybercrime control practices that can contribute to
improved homeland security?
Secondary questions include the following:
• What is the relationship between predictive big data
analysis and cybercrime reduction in cities that have
implemented the practice?
• How does the quantity and quality of historical big
data affect the relationship between predictive
analysis and cybercrime reduction?
D. 1.4. Objective of Research
The objective of this research can clearly be broken down
into two Points as the following:
• Reducing the incidence of cybercrime by big data
analytics to make cybercrime predictions in terms of
time and space.
• Efficiency improvement of the Egypt Cybercrime
Centre to empower the users of the Internet in Egypt
that They would able to be more sensitized about the
emerging trends on cybercrime. Cybercrime today is
evolving as part of our day-to-day lives.
• Aim the proposed algorithm for Egypt Cybercrime
Centre to contribute the fight against cybercrime in
Egypt.
• Aim to use predictive big data analysis techniques by
Egyptian police to contribution reallocate resources
towards homeland security missions.
II. BIG DATA
The world is becoming interconnected digitalized so that
the amount of data has been detonating every minute. To
manage the records of this data, it requires extremely
powerful action intelligence.
The problem begins during data acquisition when a huge
amount of data requires us to make decisions about what data
to be a store, what to discard, and how to store so that data
can be kept reliable and accurate. Big data refers to datasets
whose capacity is beyond the ability of the typical database,
store, oversee, manage, and analyze. It can be described as a
massive volume of both structured and unstructured data that
can’t be stored using traditional databases, which consists of
billions to trillions of records that are collected from millions
of people all from different sources. The sources of data may
come from the web, sales, customer contact center, social
media, mobile data. [6]
Big data as shown in fig 1, is a term associated related to
expansive datasets that come into existence with the features
of volume, variety, velocity, and veracity of data. Data
variability, value, and complexity are some other features
that are used with big data.
Volume: The large amount of data stored that can be
collected and analyzed effectively.
Variety: Type of data that may be structured, unstructured,
log files, text, video, audio, and transactions.
Velocity: Rate of the speed of data available and data
change for analysis.
Veracity: Related to data integrity and extend of trust in the
data to confidently that use it to make decisions.
Fig1, Big Data Properties
In the context of financial banking transaction analysis,
volume corresponds with the thousands of credit card
transactions that happen every second in every day. Variety
refers to the type of data that is used in transaction activities.
Velocity refers to how to speed data that can be processed for
analytics. Veracity related to analyzing the credit card
transactions to make decisions on it with the aim of finding
fraudulent transactions if any. These factors are important for
analyzing transactions to find fraud and taking the needed
action immediately to correct the fraudulent transactions. [7]
A. Big Data Analytics
The technology of Big data analytics is useful
information such as a hidden value, and a relation rule from
huge data. When the data volumes reach big data extents,
parsing it for important data requires exceptionally effective
data analytics. The domain of Big Data Analytics is
concerned with the extraction of value from big data which
are significant, previously unknown, implicit, and potentially
useful. These experiences have a direct effect on making a
useful decision from the interpreted data. With the assistance
of the right analytical tools, and big data can detect various
frauds. [8]
in financial banking the analytics tools perform the
following activities:
• Collects data from some of the enterprise sources.
• Performs more profound analytics on the data.
• Provides a fine view of security information.
Vol. 18, No. 6, June 2020
ISSN 1947-5500

• Achieves real-time analysis of streaming data.
III. BIG DATA ANALYTICS IN CYBER CRIME
Big Data Analytics in cybercrime security includes the
ability to collect massive amounts of digital information to
analyze, visualize, and draw knowledge that can make it
conceivable to foresee and halt cybercrime assaults. to
detecting fraud rapidly, requires real-time investigation of
many structured and unstructured data sources. detection of
fraud is one of the most visible uses in big data analytics. [9]
Most of the frauds are high-volume in nature. So, a great
opportunity is given for analytics to identify patterns from
high volume data and suggest preventive action. some of the
techniques utilized to detect frauds require recognizing
identical/repeating pattern matches of people, places,
systems, and events. Compared to conventional approaches,
huge data analytics gives a proficient cybersecurity setting
by isolating what is “normal” from what is “abnormal”,
isolating the designs produced by authorized clients from
those created by suspicious or malicious clients. [10]
By providing means to discover changing patterns of
malicious activities hidden profound in large volumes of
organizations data, big data tools can undoubtedly enable
businesses to better understand if and how they have been
attacked. Big Data can be associated with the following fraud
detection techniques: [11]
A. Descriptive Analytics - Unsupervised Learning
Descriptive analytics are target to finding the behavior
that deviates from normal behavior or to detecting anomalies.
These techniques learn from observations historical and not
require observations as a fraudulent or non-fraudulent
activity. [12]
B. Predictive Analytics - Supervised Learning
Predictive analytics are target to learn from historical
data that recovers patterns that allow permitting the contrast
between normal and fraudulent behavior. These analytics can
be applied to detect fraud as well as to estimate the amount
of fraud. [13]
C. Social Network Analytics
Target to extending the capability by detecting the
fraudulent behavior in a network of linked substances. It also
finds the relationship between entities by revealing specific
patterns indicating fraud. [14]
D. Big Data Detection Techniques
A formal digital forensic investigation cannot be
launched until extract important and significant data from the
entire data set. The focus is on the different techniques that
can facilitate the digital forensic investigator in analyzing the
big data to find the underlying relationship among the data.
Furthermore, these techniques help the investigator in
extracting meaningful and purposeful digital forensic
evidence for detecting frauds from the large datasets. [15]
IV. CRIME PREDICTION THEORY
The incidents of various types of cybercrimes in different
states of Egypt for the year 2019 were considered as the
input for the analysis. It also contained the number of
persons arrested under different age groups ranging from 18,
18 to 30, 30 to 45, and 45 to 60 and above 60, in different
states and union territories. The distinctive sorts of
cybercrimes and its description are given in Table 1. Feature
selection techniques are utilized to determine the important
features such as the conspicuous categories of cybercrime
and the age gather of people who are included in these
crimes more. The chosen features were normalized utilizing
the population attribute for each and every state in Egypt.
TABLE I. CYBERCRIME TYPE
Crime Type Description
Manipulate in computerized
documents Source
The person knowingly or intentionally
concealing, destroying code or altering
or causing another to conceal, crush or
modify any computer source utilized for
a computer, computer software engineer
or computer network
Hacking computing systems
Finding out shortcomings in a computer
or computer network, misusing and
exploiting them.
Types 1- Loss, damage to computer
source, utility
Type 2 - Hacking
Distribution indecent -
Transmission in electronic
Transmitting indecent content through
Internet/ Emails and cell phones (SMS).
Compliance failure - Orders
of certifying authority
Failure of the license is provided in
issuing a digital signature.
Unauthorized to access -
endeavor to access of
protected computing systems
Access to any computer software
programs or software sources, which
have security vulnerabilities without
legal permission.
Obtaining a license or digital
signature certificate by
deceptions
An individual who attempts to obtain
obtains or endeavors to maintain a
license by willful misrepresentation or
fraudulent representation
Publishing false digital
signature certificate
A digital signature authorizes the
identity of a person. Publishing false
signatures is similar to the crime of
personification.
False digital signature
certificate
Breach of confidentiality and
privacy
A Breach of Confidentiality is a Security
violation where the Confidentiality of
some data was lost.
A. Cybercrime spatial data analysis
Arrested under various age groups was the input for our
investigation analysis. This cybercrime data is distributed
across 28 states in Egypt. This data is normalized utilizing
the population for each state and union territory. The feature
selection technique was applied to determine the contribution
of chosen features towards cybercrime activities. The
attributes with higher ranks were further considered for our
analysis. The attributes and their F score are given in Table
Vol. 18, No. 6, June 2020
ISSN 1947-5500

2. Three among nine types of cybercrimes and the age
groups of 18-30 and 30-45, among the various age groups
considered occupied higher ranks. Thus, they were further
considered in our analysis for the prediction of relevant
patterns. [16]
TABLE II. FEATURE
Attribute I-Score
Hacking 0.7532
Obscene Publication 0.2412
Failure of compliance 0.5201
Age (18-30) 0.7421
Age (30-45) 0.6102
X-means clustering connected to the chosen resulted in
three clusters. Then an application of the K-means clustering
algorithm with the value of k as 3 was applied for finding out
the cluster patterns. [17]
The geospatial distribution of cybercrimes in Egypt is
shown in Figure 2, areas marked with red color depict the
regions where the occurrence of cybercrime incidents is high,
yellow color depicts nominal cybercrime occurring regions
and blue color depicts the areas with very low incidents of
cybercrime. Visualization and analysis of the crime patterns
results in meaningful inferences
Figure 2 Cybercrimes in Egypt
From Table 3, we watch that the lion's share of the
Egyptian domains drops beneath the third cluster where the
wrongdoing design or the event of cybercrimes is none.
Cluster 2 represents high cybercrime occurring regions and
the people involved in the cybercrime people group 35 to 45.
Cluster 1 represents the average crime occurring region and
the age group of people involved also includes the age 18 to
30.
TABLE III. MAJORITY OF THE EGYPT TERRITORIES
Intensity of Crime Arrested Persons
Crime1
Hacking
Crime2
Obscene
publication
Crime3
Failure of
compliance
Age
(18 to 35) (35 to 45)
Average None Average Average None
Average High High Very low High
None None None None None
V. PROPOSED MODEL FOR CYBER-CRIME PREDICTION
A. Collection of cybercrime dataset
A variety of cybercrime data should be collected for the
prediction of cybercrime class in the banking sector by the
analysis of cybercrime patterns. So, this data has to be
collected from various news feeds, articles, and blogs, police
department websites over the internet web. The collected
cybercrime data is stored in a cybercrime database for further
handling of data. [18]
B. Pre-processing of cybercrime dataset
The cybercrime dataset put away within the cybercrime
database has to be pre-processed before applying data mining
processes to them. Because pre-processing expels noisy data,
lost, missing values.
Figure 3 Proposed Model for cyber Crime Prediction
C. Data mining Techniques
For Pre-processed data, Data mining processes and
algorithms are implemented to identify or forecast fraud
through Knowledge innovation from abnormal patterns and
also it achieves recognition in combating cybercrime
financial fraud Data Mining by contributing in solving
tribulations within keeping banking sector by discovering
patterns, relationships, and links that are unseen in the
business information accumulated in the crime databases.
[19]
Vol. 18, No. 6, June 2020
ISSN 1947-5500

D. Association Rule mining
Based on the frequent incidents of cybercrime patterns,
Association rule mining processes rules for cybercrime
dataset. These produced rules help the assessment processes
of characterizes society to take a hindrance activity. The
procedure comprises the subsequent measures:
a) The method of deciding commonly occurring item
sets within the cybercrime database.
b) The recognizable of patterns in program
implementation and customer behaviors as association rules
known as interruption recognition.
E. Clustering
The clustering is the number of groups which divided up
to a set of records or items. Clustering is suggested in
discovering interactions linking cybercrime and criminal
characteristics having a few past strange common
characteristics. For discovering frauds in banking sectors,
clustering techniques are utilized. Clustering is stated as
unsupervised learning because its classes are not positive and
decided in progress and consortium of data is exclusive of
supervision. [20] K-Means' partition algorithm is
implemented in clustering cybercrime datasets because of its
minimalism and less computational intricacy. At first, the
quantity of data items is assembled and precise as (k)
clusters. Between the mean separations of objects, the mean
value is intended. The repositioning iterative method is
utilized to recover the partitions by transferring items from
one cluster to another. Then until the union occurs, the
number of iterations is carried out. [21]
F. K-Means Algorithm
G. Classification
Classification is the most frequently utilized data mining
technique, which executes a set of pre-classified cases to
build up a model that can classify the instances of attributes
on a huge scale. The classification technique makes an
association between a dependent variable and an independent
variable by mapping the data points. Within the given
dataset, Classification is used to bring out in which group
each data occurrence is associated. Classification is utilized
to create several models of unknown patterns and prospect
assessment on the basis of the previous decision making.
Automatic credit authorization is the about major procedure
in the banking sector and financial organizations. Frauds can
be prohibited by building an outstanding assessment for the
credit consents using the classification representation based
on decision trees such as Apache Hadoop.
H. Influenced Association Classification
For fulfilling more exactness, the affiliated classification is
an amazing and moving novel and improved method which
assimilates the mining of association rule and classifications
of the prediction model. This method is being implemented
for ruling out the link and association over item sets. The
affiliated classification comes under unsupervised learning
since it does engage any class characteristic for rule
extraction. Two steps employed to extract association rules
are: [22] [23]
a) Through the cybercrime data set, classes are
produced based on the affiliation rule.
b) In the class labels, perform an examination on the
dataset classification.
Different steps implemented in Affected Association
Classifier has been summarized below:
c) Pre-process the cybercrime dataset so assist
mining hones can be accomplished on them.
d) To replicate the assessment in the replica of
prediction, every element is assigned within a range of
weight
Attributes having additional significance are allocated
maximum weight (0.9) and having fewer significance are
allocated minimum weight (0.1).
Influenced Association Rule Mining algorithm is
implemented on pre-processed cybercrime data set for
obtaining fascinating pattern invention. Influenced
Association Classification uses weighted support and
confidence and the rules spawned by this process are known
as Classification Association Rule.
The extracted Classification Association Rules are stored
in the Rule base index. At any time if any new cybercrime
record is updated, this CAR rule forecast the class label from
the Rule base.
Vol. 18, No. 6, June 2020
ISSN 1947-5500

I. Cyber Crime Prediction using Apache Hadoop
For the classification of problems and issues in the
cybercrime prediction analysis, Apache Hadoop technique is
thorny and more precise two steps are: [24]
a) Formation of the tree.
b) Validate the built tree over the cybercrime data set.
The Apache Hadoop technique uses a clipping method
for the construction of the tree. The clipping technique
diminishes the size of the tree by removing appropriate data
that guides the terrible concert in prediction. The anticipated
Apache Hadoop technique classifies the data until the entire
categorization and affords the utmost accuracy over the
training of cybercrime data. It also stabilizes the precision
and litheness. The Apache Hadoop technique is the extensive
version of decision tree C4.5. The Apache Hadoop technique
produces the classifier output in the form of rule sets and
decision trees. The rule sets are straightforward to recognize
and too easy for employing within the application. [18]
J. Experimental Settings
K-Mean cluster consists of six data nodes, considered as
slave roles only, and one name node which is both a slave
role and a master role in our system. The details of these
nodes are listed in Table 3. Besides, and we will set the
number of replicas to be 6 since there are 6 nodes in total in
this cluster.
TABLE IV. K-MEAN CLUSTER COMPOSITION
We used the same configuration. Indicate references by Dili
WM, 2013)
VI. PERFORMANCE ANALYSIS
This section will monitor and evaluate Apache Hadoop
performance in three cases:
• Without using Apache Hadoop.
• At the beginning of using Apache Hadoop.
• After a certain period (one month) from using
Apache Hadoop.
We will take in our consideration the following
parameters in the evaluation process
• Requests returned from the Apache Storage.
• Requests returned from Apache Storage without
verification.
• Requests returned from the Apache Storage,
updating a file in cache.
• Requests returned from Apache Storage after
verifying that they have not changed.
A. Performance Metrics
The main categories of performance metrics are:
a) Apache Storage Performance: how requested Web
objects were returned from the Storage or from the network.
b) Traffic: the amount of network traffic, by date, sent
through Apache Hadoop including both Web and non-Web
traffic.
c) Daily traffic: average network traffic through
Apache Hadoop at various times during the day. This report
includes both Web and non-Web traffic.
d) Response Time: how Apache Hadoop responded to
HTTP requests during the reporting period.
e) Failures communicating: Apache Hadoop
encountered the following failures communicating with
other computers during the reporting period.
f) Dropped Packets: shows the number of dropped
network packets during the report period Users that had the
most dropped packets are listed first
g) Queue Length: Queue Length counter shows how
many threads are ready in the processor queue, but not
currently able to use the processor. Indicate references by
(Spark Streaming Programming Guide)
B. Types of Requests
We want to know the file types that occur most often in
the application server. Knowing the characteristics of the log
files based on file type gives some indication of whether the
document will change or not.
TABLE V. TYPE OF REQUEST
C. Apache Storage Performance
The Storage performance results for each of the log files
are shown below. The percentage of requests returned from
Node Instance Type CPU Memory Storage Privet IP
Node1 M1 Medium Core i7 8 GB 500 GB 10.1.1.2
Node2 M1 Small Core i5 4 GB 200 GB 10.1.1.3
Vol. 18, No. 6, June 2020
ISSN 1947-5500

storage without verification is high. It shows that between
38% of all requests result in a request returned from Apache
Hadoop without verification, which is consistent with
previously published results. Reported that only 15% to 32%
of their Apache logs results in requests returned from Storage
without verification, also notice detect unknown objects
returned from the Apache Storage
TABLE VI. STORAGE PERFORMANCE RESULTS
Status Requests
% of
Total
Requests
Total
Bytes
Objects returned from Apache
Storage
21251 59.30 % 622.73 MB
Objects returned from Apache
Storage without verification
14110 38.20 % 29.93 MB
Objects returned from Storage
after verifying that they have not
changed
587 1.40 % 0.98 MB
Information not available 337 1.00 % 47.94 KB
Unknown objects returned from
the Apache Storage
61 0.20 % 15.71 KB
Total 327251 100.00 % 675.52 MB
D. Traffic
The results for average network traffic through Apache
Hadoop at various times during the day at the beginning of
using Apache Hadoop and after a certain time from using
Apache Hadoop are in the Table below.
The results indicate that the average processing time for
handling the request is reduced by 43% after a certain time of
using Apache Hadoop because Apache Hadoop the
previously visited pages and return them directly to the client
without waste time to ask Storage server each time
a) Traffic by Time of day
The following Table summarizes average network traffic
through Apache Hadoop at various times during the day
TABLE VII. TRAFFIC BY TIME OF DAY
b) Dropped Packets
The result below shows the users who had the highest
number of dropped network packets during the reporting
period. Users that had the most dropped packets are listed
first. We can observe that the percentage of dropped packets
is reduced by time, also notice detect unknown two users
using a network IPs out of network range (172.31.0.,12).
TABLE VIII. DROPPED PACKETS
TABLE IX. DROPPED PACKETS AFTER USING APATCHE
User
At the beginning of
using Apache Hadoop
After a certain time of
using Apache Hadoop
Dropped
Packets
% of Total
Dropped
Packets
Dropped
Packets
% of Total
Dropped
Packets
10.1.1.13 35887 23.30% 682 11.80%
10.1.1.14 34871 22.50% 662 11.10%
172.31.0.2 32817 21.50% Unknown Unknown
172.31.0.1 30618 20.20% Unknown Unknown
10.1.1.12 4832 3.00% ---- 13.30%
10.1.1.20 2310 1.50% 223 4.100%
10.1.1.23 2301 1.50% 221 3.70%
An Algorithm is widely explored to detect unknown or
previously unseen two networks IP. the technique not only
detects the known Network IP but can also detect the
unknown objects returned for patch storage. The technique is
a two-step process, in the first step feature is extracted from
the know datasets which plays a vital role, not only to
represent the target concept but also to speed-up the learning
and classification/detection processes. In the second step,
appropriate machine learning techniques, and trained for
detection/classification of up normal behavior .
At the beginning of using Apache
Hadoop
After a certain time of using
Apache Hadoop
Requests
Average
Processing
Time
TotalBytes
CacheHit
Ratio
Requests
Average
Processing
Time
TotalBytes
CacheHit
Ratio
1811
141.00
sec
4.52
GB
0.00
%
6186
52.80
sec
2.87
GB
1.00
%
1617
131.40
sec
8.95
MB
0.00
%
6246
59.31
sec
35.89
MB
0.00
%
1535
122.10
sec
8.23
MB
0.00
%
5844
61.29
sec
35.27
MB
0.00
%
1816
125.20
sec
8.71
MB
0.00
%
6103
57.00
sec
34.19
MB
0.00
%
TimeInterval
Average
Requests
Per
Second
Average
BytesPer
Second
Average
Response Time
for Apache
Requests
Average
Response Time
for Non-Apache
Requests
00:00 14.4
66.48
KB
- 54.20 sec
00:15 18.1
12.14
MB
- 57.80 sec
00:30 15.8
11.47
MB
0.00 sec -
00:45 16.1
82.44
KB
Unknown 66.10 sec
01:00 15.1
64.19
KB
0.00 sec -
01:15 17.2
68.21
KB
0.00 sec -
01:30 15.5
91.32
KB
0.00 sec -
Vol. 18, No. 6, June 2020
ISSN 1947-5500

VII. VALIDATION AND VERIFICATION
The big data analysis required more visual inspection as
well as manual execution than the other components. We
have tested the solutions described above by several
activities and measured the output of these activities.
a) Tools Used
Apache Hadoop Logs- The Hadoop log was used to
determine the amount and size of data sent by clients. It also
was used to generate the emulation of client requests.
b) SCOM Reports
System Center Operations Manager (SCOM) as a storage
log analysis tool that allows the user to pull various pieces of
data from its log files such as the number of requests, bytes
transferred, hosts contacted, etc. and we used it to reveal
some characteristics of the log files used in the performance
analysis.
VIII. CONCLUSION
The proposed work focuses on cybercrime prediction by
crime mapping with recorded data using the latest
technology. The model helps in reducing cybercrime for the
security authorities. And improve network performance.
Numerous of the Network anomaly detection techniques are
designed based on the accessibility of data instances.
Numerous anomaly detection techniques have been
specifically particularly for certain application domains,
while others are more generic. this paper presents a cascaded
algorithm utilizing K–Means algorithms for big data
Anomaly Detection. The proposed algorithm is used to
detect the anomalies presented in the supervised and
unsupervised data set. The model also helps the authorities in
the investigation of crimes. Using Big Data Analytics with
the clustering approach reduces the investigation time and
helps in retrieving the hidden information.
REFERENCES
[1] Cameron S.D. Brown, “Investigating and Prosecuting Cyber Crime:
Forensic Dependencies and Barriers to Justice”, International Journal
of Cyber Criminology, ( 2015).
[2] Spalevic Z, “Cyber Security as a global challenge today”,
Singidunum Journal of Applied Sciences, (2014 )
[3] Najafabadi M., Villanustre F., Khoshgoftaar T, Seliya N., R. Wald,
and E. Muharemagic, “Deep learning applications and challenges in
big data analytics”, Journal of Big Data, ( 2015)
[4] Gupta P, N.Tyagi, “An Approach towards Big Data –A Review”,
International Conference on Computing, Communication and
Automation (IEEE), ( 2015).
[5] Tahir S, Waseem I, “Big Data−An Evolving Concern for Forensic
Investigators”, IEEE Transactions, ( 2015).
[6] Chen X , Member S , X. Lin, “Big Data Deep Learning: Challenges
and Perspectives”, IEEE Access, Vol 2, ,( 2014)
[7] Magoulas R , Lorica B, “Introduction to Big Data”, Release 2.0, Issue
11, , (Feb 2009 ).
[8] m.i.pramanik, raymond y.k. lau, wei t.yue, yunming ye and chunping
li., “big data analytics for security and criminal investigations” Wiley
interdisciplinary reviews-data mining and knowledge discovery, vol.7
no.4,1-19, (2017).
[9] chung-hsien yu, max w. Ward, melissa morabito, wei ding, “crime
forecasting using data mining techniques,” international conference
on data mining workshops, IEEE, 2011.
[10] ManjeetRege& Raymond Blanch K. Mbah, Machine Learning for
Cyber Defense and Attack , DATA ANALYTICS 2018 : The Seventh
International Conference on Data Analytics, Copyright (c) IARIA,
2018. ISBN: 978-1-61208-681-1 , pp.73–78.
[11] Tariq M , Uzma A, “Security Analytics: Big Data Analytics for Cyber
security A Review of Trends, Techniques and Tools”, 2nd National
Conference on Information Assurance (NCIA), ,( 2013).
[12] Palak G, Nidhi T, “An Approach towards Big Data –A Review”,
International Conference on Computing, Communication and
Automation (IEEE), (2015 ).
[13] Giri T, Anjan G, “A Survey on Data Science Technologies & Big
Data Analytics”, International Journal of Advanced Research in
Computer Science and Software Engineering, Vol 6, Issue 2, , (Feb
2016 ).
[14] Dean J , Ghemawat S, “MapReduce: Simplified data processing on
large clusters”, Communications of the ACM, vol 51, pp. 107-113, ,
(2008 ).
[15] Siddaraj u, Sowmya C , Rashmi K, Rahul M,( 2014) “ Efficient
Analysis of Big Data Using Map Reduce Framework”, International
Journal of Recent Development in Engineering and Technology,
Vol.2.
[16] Aksoy, S., “K–Nearest Neighbor Classifier and Distance Functions,”
Technical Report, Department of Computer Engineering, Bilkent
University (February 2008)
[17] A. Reyes, R. Brittson, K. O’Shea, and J. Steele, Cyber Crime
Investigations: Bridging the Gaps Between Security Professionals,
Law Enforcement, and Prosecutors. Elsevier Science, 2011.
[18] D. Quick and K.-K. R. Choo, “Impacts of increasing volume of digital
forensic data: A survey and future research challenges,” Digital
Investigation, vol. 11, no. 4, pp. 273 – 294, 2014.
[19] R. Rowlingson, “A ten step process for forensic readiness,”
International Journal of Digital Evidence, vol. 2, no. 3, pp. 1–28,
2004.
[20] A. Guarino, “Digital forensics as a big data challenge,” in ISSE 2013
Securing Electronic Business Processes. Springer, 2013, pp. 197–203.
[21] P. Dhaka and R. Johari, “Crib: Cyber crime investigation, data
archival and analysis using big data tool,” in 2016 International
Conference on Computing, Communication and Automation
(ICCCA), April 2016, pp. 117–121.
[22] H. Van Beek, E. van Eijk, R. van Baar, M. Ugen, J. Bodde, and A.
Siemelink, “Digital forensics as a service: Game on,” Digital
Investigation, vol. 15, pp. 20–38, 2015.
[23] Alessandro G, “Digital Forensic as a Big Data Challange”, ISSE
Securing Electronic Business Processes,( 2013).
[24] Katarina G, Michael H, Wilson A. Higashino, A, David S. Allison,
and Miriam A. Capretz M, “Challenges for MapReduce in Big Data”,
Proc. of the 10th 2014 world congress on services. ,( 2014).
AUTHORS PROFILE
Hossam Abdel Rahman Mohamed:
Doctor degree for computer science in Cairo
University, Computer and Information
Technology Dept. His currently position is IT
Director at Bek Group.
Vol. 18, No. 6, June 2020
ISSN 1947-5500

JOURNAL
IJCSIS
Journal Impact Factor
Google Scholar Alerts
Conference Partnership
Open Access Journals
Sitemap
CALL FOR PAPERS
Call for Papers September
2020
Call for Papers August
2020
1st Special Issue - 2019
Special Issue 2018
AUTHORS
Notes for Authors
Submit Paper
Publication Fee
Review Process
IJCSIS PUBLICATION
ARCHIVES
All Volumes & Issues
Vol. 18 No. 7 JULY 2020
Vol. 18 No. 6 JUN 2020
Vol. 18 No. 5 MAY 2020
Vol. 18 No. 4 APR 2020
Vol. 18 No. 3 MAR 2020
Vol. 18 No. 2 FEB 2020
Vol. 18 No. 1 JAN 2020
Vol. 17 No. 12 DEC 2019
Vol. 17 No. 11 NOV 2019
Vol. 17 No. 10 OCT 2019
Vol. 17 No. 9 SEP 2019
Vol. 17 No. 8 AUG 2019
Vol. 17 No. 7 JULY 2019
Vol. 17 No. 6 JUNE 2019
Vol. 17 No. 5 MAY 2019
Vol. 17 No. 4 APR 2019
Vol. 17 No. 3 MAR 2019
Vol. 17 No. 2 FEB 2019
Vol. 17 No. 1 JAN 2019
Vol. 16 No. 12 DEC 2018
Vol. 16 No. 11 NOV 2018
Vol. 16 No. 10 OCT 2018
------------------------------------------------------------------------------------------------------------------------------
The International Journal of Computer Science and Information Security (IJCSIS) is
one of the leading open access publisher, with hundreds of papers published each year
related to different areas ranging from computer science, mobile & wireless computing,
networking and information security. The core vision of IJCSIS is to promote knowledge
and technology advancement for the benefit of academia, professional research
communities and industry practitioners. The aim is to support you to achieve success in
your research and scholarly experience.
Researchers, PhD scholars and professionals from academia and industry are solicited to
submit completed research and developments in the listed areas below. With a large
research community of authors, readers, editors and reviewers bounded together by their
talent and integrity, IJCSIS publications are available online freely for everyone worldwide.
All published papers undergo high-quality peer review and rigorous editorial processes.
The journal of Computer Science and Information Security is an Open Access journal
since 2009 with high citations in Google Scholar.
ISSN 1947 5500 Copyright © IJCSIS.
------------------------------------------------------------------------------------------------------------------------------
International Journal of Computer
Science and Information Security
IJCSIS August 2020 Volume 18,
No. 8
Important Dates:
Paper Submission (until) - 11
August 2020 (Deadline
Extension)
* Deadline extension to submit a paper can
be offered on request.
Decision Notification (2-3 weeks)
- August 19-23, 2020
Issue Publication (Online) -
International Journal of Computer
Science and Information Security
IJCSIS September 2020 Volume
18, No. 9
Important Dates:
Paper Submission (until)
- September 04, 2020
* Deadline extension to submit a paper can
be offered on request.
Decision Notification (2-3 weeks) -
September 18-21, 2020
Issue Publication (Online) -
October 03, 2020
Search this site
International Journal of Computer Science and Information Security https://sites.google.com/site/ijcsis/Home

Vol. 16 No. 9 SEP 2018
Vol. 16 No. 8 AUG 2018
Vol. 16 No. 7 JULY 2018
Vol. 16 No. 6 JUNE 2018
Vol. 16 No. 5 MAY 2018
Vol. 16 No. 4 APR 2018
Vol. 16 No. 3 MAR 2018
Vol. 16 No. 2 FEB 2018
Vol. 16 No. 1 JAN 2018
Vol. 15 No. 12 DEC 2017
Vol. 15 No. 11 NOV 2017
Vol. 15 No. 10 OCT 2017
Vol. 15 No. 9 SEP 2017
Vol. 15 No. 8 AUG 2017
Vol. 15 No. 7 JUL 2017
Vol. 15 No. 6 JUN 2017
Vol. 15 No. 5 MAY 2017
Vol. 15 No. 4 APR 2017
Vol. 15 No. 3 MAR 2017
Vol. 15 No. 2 FEB 2017
Vol. 15 No. 1 JAN 2017
Vol. 14 VirtualCom 2016
Vol. 14 No. 12 DEC 2016
Vol. 14 No. 11 NOV 2016
Vol. 14 No. 10 OCT 2016
Vol. 14 No. 9 SEP 2016
Vol. 14 CIC 2016
Vol 14 ICETCSE 2016
Vol. 14 No. 8 AUG 2016
Vol. 14 No. 7 JUL 2016
Vol. 14 No. 6 JUN 2016
Vol. 14 No. 5 MAY 2016
Vol. 14 No. 4 APR 2016
Vol. 14 No. 3 MAR 2016
Vol. 14 No. 2 FEB 2016
Vol. 14 Special FEB 2016
Vol. 14 No. 1 JAN 2016
Vol. 13 No. 12 DEC 2015
Vol. 13 No. 11 NOV 2015
Vol. 13 No. 10 OCT 2015
Vol. 13 No. 9 SEP 2015
Vol. 13 No. 8 AUG 2015
Vol. 13 No. 7 JUL 2015
Vol. 13 No. 6 JUN 2015
Vol. 13 No. 5 MAY 2015
Vol. 13 No. 4 APR 2015
Vol. 13 No. 3 MAR 2015
Vol. 13 No. 2 FEB 2015
Vol. 13 No. 1 JAN 2015
Vol. 12 No. 12 DEC 2014
September 03, 2020

List of detail topics including, but not limited to:
Computer science [more details] Information security [more details]
Information and communication
technology [more details]
Cloud computing security [more
details]
Wireless, mobile, and sensor networks [more
details]
Forensics computing and security
[more details]
Parallel and distributed systems [more details] Network security and privacy [more
details]
Pervasive computing [more details] Security, Trust and Privacy [more
details]
Data mining and predictive modelling [more
details]
Cloud and big data analytics [more
details]
Computer vision [more details] Data warehouse [more details]
Multimedia systems [more details] Internet of Things (IoT) [more
details]
3D Modelling, animation and virtual reality
[more details]
Enterprise systems [more details]
Biometrics and pattern recognition [more
details]
Software engineering [more details]
Computational science [more information] Software security [more details]
Digital image processing [more details] Business Intelligence &
Analytics [more details]
Computer networks [more details] Wireless sensor networks [more
details]
Green and Sustainable Computing [more
details]
Educational and web technologies
Software testing tools & technologies Computer applications technology
Network protocols, services and applications Intelligent systems
Cloud Services and Networks [aims and
scope]
Communication Technologies [aims
and scope]
Cloud Computing [aims and scope] Applied Informatics [aims and
scope]
Information Processing [aims and scope] Smart Learning Environments [aims
and scope]
Next Generation Wired/Wireless Advanced
Networks and
Systems [aims and scope]
Interaction Science [aims and scope]
Mathematical/Analytical Modelling and
Computer Simulation [aims and scope]
Social and Mobile Connected Smart
Objects
[aims and scope]
News and Updates
Whats New? On this page you will find the latest happening and information about IJCSIS
International Journal of Computer Science and Information Security https://sites.google.com/site/ijcsis/Home

A proposed model_for_cybercrime_detectio

More Related Content

What's hot (20)

Similar to A proposed model_for_cybercrime_detectio (20)

More from Hossam Al-Ansary (7)

Recently uploaded (20)

A proposed model_for_cybercrime_detectio