SlideShare a Scribd company logo
INTRO TO DATA FUSION AND LOG
CORRELATION FOR WEB ANALYTICS
By: Mahdi SayyadOctober 14, 2017
TOPICS
Who we are?
Our Work Process
Data fusion
Fusion and Integration
Information Fusion Taxonomy
DF Models and architectures
Log and logging
Log/Event Correlation
Log Types and Sources
Log Correlation Pre-requisites
Models/Types of Correlating logs
Correlation Methods and techniques
Relation between Correlation and
Fusion
 Conclusion
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
WHO WE ARE
Mahdi Sayyad
MS in Computer Engineering ,CEH ,CCNA,
ISMS Lead Auditor, Cybersecurity
Researcher, Co-Founder of ICSGROUP.
6+ year experience in InfoSec and
Cybersecurity analysis
Mohammad H. Bazrafkan
MS in Information Security, CEH, OSCP, Co-
Founder ARTINERTEBAT.
6+ year experience in InfoSec and
Cybersecurity analysis
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
WHY WE ARE HERE?
To facilitate things 
Research, develop and propose available
techniques/solution(s) for Fusion and Correlating
massive data-set(Logs) collected and feed to
web ranking system
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
OUR RESEARCH PROCESS
Requirements
analysis
• Problem Definition
• Requirement Skills
• Scope Determination
• Team Selection and
acquisition
Information
Gathering and
Resource
Selection
• Papers(journals and
Conferences)
• Books(handbooks and
proceeding studies)
• Thesis
• Technical reports
• Workshop presentation
• Main Keywords: Data
Fusion/Log Correlation/web
log Analytics
Review and
Study
• Fundamentals and
scope define
• Models and
Architectures
• Tools and Techniques
Feasibility
Analysis and
Proposed
solutions
• Leveraging tools and
techniques for use case
• Selection and
Developing proposed
solution
Reporting
• Phase 1
• Phase 2
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
DATA FUSION
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
DATA FUSION: INTRODUCTION
 Information fusion used in detection and classification tasks in different
application domains, first seen in military and image processing applications
 Lately, these mechanisms have been used in new applications such as
intrusion detection and Denial of Service (DoS) detection and etc.
Data Fusion requires combining expertise in 2 areas:
 Sensors
 Information integration
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
DATA FUSION
• a.k.a : "data fusion", "information fusion", "combination of evidence“
• Data Fusion: data fusion is the name which has been given to a variety of interrelated problems
which have arisen primarily in military applications.
I. “Locate and identify many unknown objects of many different types on the basis of different kinds of
evidence. This evidence is collected on an ongoing basis by many possibly re-allocable sensors having
varying capabilities. Analyze the results in such a way as to supply local and over-all assessments of the
significance of a scenario and to determine proper responses based on those assessments”. (Mathematics of
Data Fusion-Springer (1997"
II. Data fusion is the process of combing information from a number of different sources to
provide a robust and complete description of an environment or process of interest.
• In a nutshell, information fusion can be defined as the combination of n multiple sources to
obtain improved information (cheaper, greater quality, or greater relevance).
o Sensor Fusion: In this kind of fusion, evidence from two or more usually closely located sensors-e.g., pictures from
imaging sensors-is combined in order to get information which is more precise than that obtainable from each separate
piece of evidence. Fusion of this type can involve the combination of raw signals supplied by sensors (sometimes called
"pre-detection fusion), or of elementary "features" or "segments" extracted from the signals by pre-processors.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
FUSION OR INTEGRATION: FAST VS. BATCH
Multisensory Integration
Information Fusion
Sensor Fusion
Data
integration
The relationship among Fusion Terms
 Data Fusion More than Just
Integration
 Information Fusion diverges from
Data Integration in a few key ways:
 Data Reduction
 Handling Streaming/Real-Time Data
 Human Interfaces
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
FUSION OR INTEGRATION: FAST VS. BATCH
Sensor Fusion: The combining of sensory data
or data derived from sensory data such that
the resulting information is in some sense better
than would be possible when these sources
were used individually.
Multisensor Integration: The synergistic use of
sensor data for the accomplishment of a task
by a system. This differs from sensor fusion in
that sensor fusion includes the actual
combination of sensory information into one
representational format.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
DATA FUSION APPLICATIONS
 Robotics
 Intelligent Transportation Systems
 Precision Agriculture
 Security, Improving Intrusion Detection
 Data Privacy
 Assessing and Monitoring Civil
Infrastructures
 Environmental Monitoring
 Fire Detection
 Financial analysis
 Fault diagnosis
 Medical Diagnoses
…all data analytics and
situation assessment issues
can use fusion technology
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
CLASSIFICATION INFORMATION FUSION
 Relationships among the input data sources may be used to segregate information
fusion into classes (e.g., complementary, redundant and Cooperative data).
 Abstraction level of the input/output data types(and their Nature) of a fusion
process.(proposed by Dasarathy).
 Abstraction Levels of the manipulated data during the fusion process (abstraction
level of the employed data) includes (raw measurement, signal, feature, decision)
can be used to distinguish among fusion processes.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
 Complementary: When information provided by the sources represents different
portions of a broader scene, information fusion can be applied to obtain a piece of
information that is more complete (broader).
 Redundant: If two or more independent sources provide the same piece of
information, these pieces can be fused to increase the associated confidence.
 Cooperative: Two independent sources are cooperative when the information
provided by them is fused into new information (usually more complex than the
original data) that, from the application perspective, better represents the reality.
CLASSIFICATION BASED ON RELATIONS
OF DATA SOURCES
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
CLASSIFICATION BASED ON
RELATIONSHIP AMONG DATA SOURCES
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
CLASSIFICATION BASED ON LEVELS OF
ABSTRACTION
Use abstraction level of the employed data: pixel, signal, feature, and symbol.
Signal : Deals with single or multidimensional signals from sensors. It can be used in
real-time applications or as an intermediate step for further fusions.
Pixel : Operates on images and can be used to enhance image-processing tasks.
Feature : Deals with features or attributes extracted from signals or images, such as
shape and speed.
Symbol : Information is a symbol that represents a decision, and it is also referred
to as decision level.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
CLASSIFICATION BASED ON LEVELS OF
ABSTRACTION
According to the abstraction level of the manipulated data, IF can be classified into
four categories:
 Low-Level Fusion: Also referred to as data/signal/measurement level fusion. Raw data are
provided as inputs, combined into new piece of data that is more accurate (reduced noise) than
the individual inputs.
Ex. Polastre et al. [2004] provide an example of low-level fusion by applying a moving average filter to estimate
ambient noise and determine whether or not the communication channel is clear.
 Medium-Level Fusion: Attributes or features of an entity (e.g., shape, texture, position) are
fused to obtain a feature map that may be used for other tasks (e.g., segmentation or detection
of an object). This type of fusion is also known as feature/attribute level fusion.
 High-Level Fusion: Also known as symbol or decision level fusion. It takes decisions or symbolic
representations as input and combines them to obtain a more confident and/or a global decision.
 Multilevel Fusion: When the fusion process encompasses data of different abstraction levels—
when both input and output of fusion can be of any level (e.g., a measurement is fused with a
feature to provide a decision)—multilevel fusion takes place.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
CLASSIFICATION BASED ON INPUT AND
OUTPUT
 Data In–Data Out (DAI-DAO) : In this class, information fusion deals with raw data
and the result is also raw data, possibly more accurate or reliable.
 Data In–Feature Out (DAI-FEO) : Information fusion uses raw data from sources to
extract features or attributes that describe an entity. Here, “entity” means any object,
situation, or world abstraction.
 Feature In–Feature Out (FEI-FEO) : FEI-FEO fusion works on a set of features to
improve/refine a feature, or extract new ones.
 Feature In–Decision Out (FEI-DEO) : In this class, information fusion takes a set of
features of an entity generating a symbolic representation or a decision.
 Decision In–Decision Out (DEI-DEO) : Decisions can be fused in order to obtain
new decisions or give emphasis on previous ones.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
CLASSIFICATION
BASED ON INPUT
AND OUTPUT
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
DATA FUSION MODELS
Models and architectures proposed to design information fusion systems can be
centered on the abstraction of the data generated during fusion.
1) Joint Directors of Laboratories Model (JDL)
2) Modified Waterfall Fusion Model (WFDF)
3) The Intelligence Cycle-Based Model
4) Boyd Model
5) Omnibus Model
6) Dasarathy Model
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
JOINT DIRECTORS OF LABORATORIES
MODEL(JDL)
 JDL is a popular model in the fusion research community.
 The model is composed of five processing levels, an associated database, and an
information bus connecting all components.
 Drawback of the JDL model is that it does not make explicit the interaction among
the processing elements. Moreover, it suppresses any feedback: it does not specify
how current or past results of fusion can be used to enhance future iterations.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
JDL Model
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
WATERFALL DATA FUSION PROCESS
(WFDF) MODEL
 waterfall data fusion process (WFDF) model, a sort of hierarchical architecture,
concentrates on the processing functions on the lower levels. These functions have some
similarities with the JDL model:
I. Sensing/signal processing relates to source pre-processing
II. Feature extraction/pattern processing corresponds to object refinement (OR),
III. Situation assessment (SA) is connected to situation refinement (SR) and
IV. Decision making relates to threat refinement (TR).
 A modified WFDF model as proposed with increasing sub-task complexity. This
modified model (MWFDF) is more an action-oriented model as it is augmented by
local feedback loops
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
MWFDF
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
BOYD MODEL
 The Boyd model depicts the classical
decision-support mechanism in information
operations with a closed/feedback loop.
As this Boyd control loop (BCL) model has
a cycle of observe–orient–decide–act
(OODA) features, an appropriate
combination of the IC and BCL loop
models would yield a better fusion process
model. This combined model would be a
less abstract and more action-oriented
model, and one such combination results in
the Omnibus (OB) mode.
The Boyd-OODA cyclic loop
model
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
OMNIBUS MODEL
 The OB model combines several
beneficial features of other approaches.
Interestingly this OB model seems to be a
hybrid of the three models :(i) BCL model,
(ii) Dasarathy model and (iii) Waterfall
model. The model:
a) specifies the ordering of processes
b) makes the cyclic nature more explicit
c) uses a general terminology
 A few WFDF aspects are seen in the OB
model, and it is more generalized than
many other models.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
DASARATHY OR DFD (DATA-FEATURE-DECISION)
OR FODF(FUNCTIONALITY ORIENTED) MODEL.
 The Dasarathy or DFD (Data-Feature-Decision) model is a fine-grained information-
centered model in which the elements of information fusion are specified based on
their inputs and outputs.
 model is based on the fusion functions rather than the tasks.
The DFD model: The primary input is raw data and the main output is a decision.
The components responsible for the several fusion stages are the elements DAI-
DAO,DAI-FEO, FEI-FEO, FEI-DEO and DEI-DEO.
 In specifying the main types of fusion regarding their input and output data. For
this reason it is also used to classify information fusion.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
DASARATHY MODEL
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
DASARATHY’S DATA-DIRECTED VS TASK-
DIRECTED FUSION
Dasarathy’s functional DF model
Modified multi-sensory DF model
Task-directed
Data-directed
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
DATA FUSION ARCHITECTURES
Central fusion: refers to the process by which a suite of sensors supplies their observations
directly to a central fusion processing site over data links.
Distributed fusion: (sometimes also called track-to-track fusion refers to the process of fusing
together both observations and the target estimates supplied by remote fusion sources-even
though these sites may (and usually do) share data collection resources. Distributed fusion of
information strongly depends on the particular topology of the communications network which
interconnects the fusion sites. he architecture is very useful for large flexible/smart structures,
aircraft/spacecraft health monitoring, large automation/chemical plants, large sensor NWs
and chemical industrial applications.
Hybrid Fusion: This architecture involves both centralized and distributed DF schemes based on
the disposition of the required sensor configurations. This combination scheme is used for some
applications, where certain arrangement of sensors is already pre-defined or is already in
operation, and a variety of similar, dissimilar sensors/measuring systems (like radars) are used.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
(a) centralized processing, (b) decentralized
processing, (c) hierarchical. (Adapted from Raol, J. R. Multi-Sensor
Data Fusion with MATLAB. CRC Press, FL, 2010.)
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
LOG CORRELATION
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
EVENT AND LOG
 Event – a change in the system state, e.g., a disk failure; when a system
component (application, network device, etc.) encounters an event, it could
emit an event message that describes the event.
 Log:
I. a full written record of a journey, a period of time, or an
event(Cambridge Dictionary).
II. in a computing context, A log is the automatically produced and
time-stamped documentation of events relevant to a particular
system. Virtually all software applications and systems produce log
files.(whatis.techtarget.com)
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
EVENT LOGGING
Log Event Alert Incident
 Event logging – a procedure of storing event messages to a local or remote
(usually flat-file) event log.
 Event logs play an important role in modern IT systems:
o many system components like applications, servers, and network devices have a built-in support for
event logging (with the BSD syslog protocol being a widely accepted standard),
o since in most cases event messages are appended to event logs in real-time, event logs are an
excellent source of information for monitoring the system (a number of tools like Swatch and
Logsurfer have been developed for log monitoring)
o information that is stored to event logs can be useful for analysis at a later time, e.g., for audit
procedures.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
EVENT SOURCES AND TYPES
Types
of
Logs:
Audit Logs
Transaction
Logs
Intrusion
Logs
Connection
Logs
System
Performance
Activity Logs
Application
specific logs
OS Specific
LogsWhere to find logs:
• Servers
• Firewalls
• Databases
• Routers
• Point of Sale Devices
• USB Drives
• Kiosks
• Etc
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
KEY ITEMS IN LOGGING
Log Transport Log Syntax
and Format
Log
Taxonomy
and
components
Log Settings
and
Configuration
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
LOG FUNDAMENTAL REQUISITES
Log files have certain fundamental requisites. They are:
• Integrity: The log must be unaltered and totally free of any tampering or
modification by unauthorized operators.
• Time Stamping: The log must ensure a reasonable certainty as to exactly
when a certain event was registered. This is absolutely essential for making
post-incident correlations.
• Normalization and Data reduction: Normalization refers to the extraction of
a datum from the original format of the log file without altering its integrity.
This datum can then be correlated with others of a different type. Data
reduction (a.k.a. filtering) is a procedure for identifying pertinent events and
correlating them according to selective criteria.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
CORRELATION
Cor·re·la·tion n.
1. A causal, complementary, parallel, or reciprocal relationship, especially a
structural, functional, or qualitative correspondence between two
comparable entities: a correlation between drug abuse and crime.
2. Statistics. The simultaneous change in value of two numerically valued
random variables: the positive correlation between cigarette smoking and
the incidence of lung cancer; the negative correlation between age and
normal vision.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
LOG/EVENT CORRELATION
Event correlation – a conceptual interpretation procedure where new meaning is assigned
to a set of events that happen within a predefined time interval [Jakobson and Weissman,
1995]. During the event correlation process, new events might be inserted into the event
stream and original events might be removed.
Examples:
 if 10 login failure events occur for a user within 5 minutes, generate a security attack event;
 if both device internal temperature too high and device not responding events have been observed within
5 seconds, replace them with the event device down due to overheating.
A number of approaches have been proposed for event correlation (rule-based, codebook
based, neural network based etc. methods), and a number of event correlation products are
available on the market (HP ECS, SMARTS, NerveCenter, RuleCore, LOGEC, etc.)
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
EVENT LOG MONITORING
Dec 18 08:47:26 myhost [daemon.info] sshd[15172]: log: Connection from
10.2.211.19 port 1304
Dec 18 08:47:39 myhost [daemon.info] sshd[15172]: log: Password authentication
for alex accepted.
Dec 18 08:50:09 myhost [kern.crit] vmunix: /var/tmp: file system full
Dec 18 08:50:10 myhost [mail.debug] imapd[15399]: imap service init from
10.2.213.2
Dec 18 08:50:10 myhost [mail.info] imapd[15399]: Login user=risto host=risto2
[10.2.213.2]
 Commonly used log monitoring tools – Swatch, Logsurfer, etc.
 Current log monitoring practice – match logged lines in real time with regular
expressions and perform an action (e.g., send an alert) when a matching line is
observed.
 Open issues – in order to write rules (regexp → action) for log monitoring tools, one
must have a good knowledge about the IT system and log messages; existing tools
don’t support event correlation well.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
CORRELATION PRE-REQUISITES
1. Data Transport – Moving Log Data from point A to point B. Security of the data is
important at this level, by reducing risk exposure by encrypting and authenticating the
data, data transport can be accomplished with high certainty.
2. Data Normalization – Taking the data from disparate log formats and creating a
flexible single log format.
3. Aggregation – Taking Data from multiple B level devices and consolidating the data
into one system.
4. Data Reduction – Reducing the data through deletion of duplicates, combining
similar events into a single filtering, or using summary information. Compressing
the data also is helpful to reduce bandwidth usage.
5. Correlation Engine – The process of correlating field variables into a unique
consolidated event.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
MODELS/TYPES OF CORRELATION
 Micro Level Correlation – Comparing fields within one data set. Most of the
vendors today are performing this type of correlation. Micro correlation is
typically done with just the fields in the normalized data. Otherwise known as
atomic correlations.
 Macro Level Correlation – Comparing multiple data sets. Macro level correlation
is concerned with pulling in other sources of information in order to further validate
or gain intelligence on your event stream. This technique is sometimes referred to
as fusion correlation. One example of macro-correlation is taking vulnerability
scan data into account and comparing it inline with event data coming into your
analysis system.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
MICRO CORRELATION
Field Correlation - The ability to correlate specific events to a single field or multiple
fields in the normalized data. Can be as simple as a search for all port 80 attacks.
Rule/Pattern Correlation – The ability via specific rules to label a certain set of events
as a correlated event. Often used to correlate events over a long period of time.
If the system sees an event E1 where E1.eventType=portscan
followed by
an event E2 where E2.srcip=E1.srcip and E2.dstip=E1.dstip and
E2.eventType=fw.reject
then
doSomething
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
MACRO CORRELATION
Vulnerability Correlation – The process of mapping event IDS event names to
vulnerability names this allows for reduction in false positives. This is particular useful
with a good tuned vulnerability scanner and IDS.
Profile (Finger Print) Correlation - Forensic network data such as remote port scans,
remote OS finger prints, finger information, and banner snatching provides a series
of data sets that can be compared to help correlate attacks to attacker profiles.
Watch List Correlation – Using a set of learned inputs the watch list can serve as a
reminder of previous offenders in real-time. Correlating previous attackers with
current attacks.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
CORRELATION METHODS AND
TECHNIQUES
A number of techniques
have been proposed for
event correlation
rule-based
Graph based
codebook based
neural network based
two basic
approaches to
log analysis
Top-Down
Approach
Bottom-Up
Approach
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
CORRELATION METHODS AND
TECHNIQUES
 Rule-based (HP ECS, IMPACT, RuleCore, etc.) – events are correlated according to the rules
condition → action that are specified by the human analyst.
 Codebook based (SMARTS) – if a set of events e1,...,ek must be interpreted as event A, then
e1,...,ek are stored to the codebook as a bit-vector pointing to A. In order to correlate a set of
events, look for the most closely matching vector in the codebook, and report the interpretation
that corresponds to the vector.
 Graph based – find all dependencies between system components (network devices, hosts,
services, etc.) and construct a graph with each node representing a system component and each
edge a dependency between two components. When a set of fault events occurs, use the graph
for finding possible root cause(s) of fault events (e.g., 10 “HTTP server not responding” events
were caused by the failure of a single network link).
 Neural network based – a neural net is trained for the detection of anomalies in the event
stream, root cause(s) of fault events, etc.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
CORRELATION AND FUSION
 Data fusion is a multilevel, multifaceted process dealing with the
registration, detection, association, correlation, and combination of data
and information from multiple sources to achieve refined state and
identity estimation, and complete and timely assessments of situation
(including threats and opportunities). (according to the standard definition
developed by the Department of Defense [DoD] Joint Directors of Laboratories).
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
CORRELATION IN FUSION PROCESS
(LOW-LEVEL)
DF Process
Registration
Detection
Correlation
and
Association
Combination
State and
Identity
Estimation
Assessments
of Situation
and Threats
• Fusion with no correlation
• Fusion with correlation
• There is also, correlation can
be done in a separate
process (when there is no
concern about abstract levels
of data) or part of IF.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
CORRELATION IN LOGGING AND LOG
MANAGEMENT
Log
Analysis
Process
raw log data
Filter
normalization
Correlation
Action
Raw Log
data
Filter Correlation
Normalized
Log
Exceptions Long-time Storage
To analysis
Alerts
Emails
To long-Time Storage
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
WHAT TO DO WITH CORRELATED
DATA?
Examples of what to do with Correlated Data
 Threat Analysis – This is a process of taking the correlated data and determining
what it means and then prioritizing the data. Includes the reduction of False Positives.
 Forensic Analysis – The ability to analyze correlated events historically for trending
and for prosecution.
Situation/Policy Analysis – Correlation provides a holistic view, providing a complete
picture of your security posture.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
FUSION AND CORRELATION IN WEB
ANALYTICS PROCESS
Implement changes
Analyze data
Collect data
Build KPIs
Define goals
Web analytics steps:
Repeat
Building a scalable ArchitectureBY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
CONCLUSION: BRING ALL TOGETHER
Resource Management
Sensor Fusion
Data Fusion
Information
Fusion
Tracking
Data
Mining
EstimationCorrelation
Sensor
Management
Adapted from: Data Fusion For Delivering
Advanced Traveler Information Services, 2003.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
CONCLUSION
Which one ?
 Fusion  integration issues in different abstraction level, very large,
distributed, streaming data set, getting better confident and precise
data for situation assessment.
 Correlation  can done at many steps of data fusion/integration for
defined target scope, specifically useful when there is a pre-defined
target/category. Finding new relation to pre-determined variable.
 Log Analysis  for our goal since we have log/event data, log
analysis is essential, so correlation is a choice.
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
WHERE WE GO NOW
Literature review
and case study for
weblog correlation
Tools and Solution
for web log
analysis(based on
goals of projects
Propose the
methods and
solution for web
ranking project
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
THE END
Ay Questions?
Give me feedback
Email: msayyad@protonmail.com
Telegram ID: Hex0r
Thank for your attention
BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM

More Related Content

PPTX
Data fusion
PPTX
Incentive Compatible Privacy Preserving Data Analysis
PDF
IRJET- Secure Data Access on Distributed Database using Skyline Queries
PDF
G1803054653
DOCX
Incentive compatible privacy preserving data
PDF
Online stream mining approach for clustering network traffic
PDF
Online stream mining approach for clustering network traffic
PDF
Cloud java titles adrit solutions
Data fusion
Incentive Compatible Privacy Preserving Data Analysis
IRJET- Secure Data Access on Distributed Database using Skyline Queries
G1803054653
Incentive compatible privacy preserving data
Online stream mining approach for clustering network traffic
Online stream mining approach for clustering network traffic
Cloud java titles adrit solutions

What's hot (20)

PDF
Paper id 25201463
PDF
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
PDF
An Improved Differential Evolution Algorithm for Data Stream Clustering
PDF
A predictive model for network intrusion detection using stacking approach
PDF
A real-time big data sentiment analysis for iraqi tweets using spark streaming
PDF
Data mining for_java_and_dot_net 2016-17
PPTX
Journals analysis ppt
PDF
Algorithm for calculating relevance of documents in information retrieval sys...
PPTX
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
PDF
A Review on Text Mining in Data Mining
PDF
Using Cisco Network Components to Improve NIDPS Performance
PDF
Parallel and distributed system projects for java and dot net
PDF
Intrusion detection with Parameterized Methods for Wireless Sensor Networks
PDF
Implementing Proof of Retriavaibility for Multiple Replica of Data File using...
PDF
Paper id 25201431
PPT
Recording and Reasoning Over Data Provenance in Web and Grid Services
PDF
Hardware enhanced association rule mining
PDF
Authentication techniques in smart grid: a systematic review
PDF
1855 1860
PDF
On Using Network Science in Mining Developers Collaboration in Software Engin...
Paper id 25201463
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
An Improved Differential Evolution Algorithm for Data Stream Clustering
A predictive model for network intrusion detection using stacking approach
A real-time big data sentiment analysis for iraqi tweets using spark streaming
Data mining for_java_and_dot_net 2016-17
Journals analysis ppt
Algorithm for calculating relevance of documents in information retrieval sys...
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
A Review on Text Mining in Data Mining
Using Cisco Network Components to Improve NIDPS Performance
Parallel and distributed system projects for java and dot net
Intrusion detection with Parameterized Methods for Wireless Sensor Networks
Implementing Proof of Retriavaibility for Multiple Replica of Data File using...
Paper id 25201431
Recording and Reasoning Over Data Provenance in Web and Grid Services
Hardware enhanced association rule mining
Authentication techniques in smart grid: a systematic review
1855 1860
On Using Network Science in Mining Developers Collaboration in Software Engin...
Ad

Similar to data Fusion and log correlation (20)

PDF
Cross Domain Data Fusion
PDF
Frequent Item set Mining of Big Data for Social Media
PDF
Frequent Item set Mining of Big Data for Social Media
PPTX
A Privacy Framework for Hierarchical Federated Learning
PPTX
Debbie Wilson: Deliver More Efficient, Joined-Up Services through Improved Ma...
DOCX
privacy-enhanced web service composition
DOCX
JPJ1452 Privacy-Enhanced Web Service Composition
PDF
Anomalous symmetry succession for seek out
PDF
TUW-ASE Summer 2015: Advanced service-based data analytics: Models, Elasticit...
PDF
IEEE 2014 C# Projects
PDF
IEEE 2014 C# Projects
PPTX
Lspnew (1)
DOCX
Nikita rajbhoj(a 50)
PPTX
ICMCSI 2023 PPT 1074.pptx
PDF
IRJET- Improving the Performance of Smart Heterogeneous Big Data
PDF
FEATURE EXTRACTION AND FEATURE SELECTION: REDUCING DATA COMPLEXITY WITH APACH...
PDF
FEATURE EXTRACTION AND FEATURE SELECTION: REDUCING DATA COMPLEXITY WITH APACH...
PDF
Feature Subset Selection for High Dimensional Data using Clustering Techniques
PDF
Back to Basics - Firmware in NFV security
PDF
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Cross Domain Data Fusion
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
A Privacy Framework for Hierarchical Federated Learning
Debbie Wilson: Deliver More Efficient, Joined-Up Services through Improved Ma...
privacy-enhanced web service composition
JPJ1452 Privacy-Enhanced Web Service Composition
Anomalous symmetry succession for seek out
TUW-ASE Summer 2015: Advanced service-based data analytics: Models, Elasticit...
IEEE 2014 C# Projects
IEEE 2014 C# Projects
Lspnew (1)
Nikita rajbhoj(a 50)
ICMCSI 2023 PPT 1074.pptx
IRJET- Improving the Performance of Smart Heterogeneous Big Data
FEATURE EXTRACTION AND FEATURE SELECTION: REDUCING DATA COMPLEXITY WITH APACH...
FEATURE EXTRACTION AND FEATURE SELECTION: REDUCING DATA COMPLEXITY WITH APACH...
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Back to Basics - Firmware in NFV security
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Ad

More from Mahdi Sayyad (7)

PDF
ISO15408CC_ارزیابی امنیتی محصول.pdf
PDF
Data Fusion model for web analytics
PDF
ِData Fusion and Log correlation tools & case-studies
PDF
ارزیابی امنیتی محصولات فناوری اطلاعات(ISO 15408 Common Criteria)- بخش دوم
PDF
شناسایی بات نت های Fast-Flux با استفاده از تحلیل ناهنجاری DNS
PDF
ISO/IEC 15408-Common Criteria: ارزیابی امنیتی محصولات فناوری اطلاعات
PDF
امنیت سامانه های کنترل صنعتی واسکادا-Industrial Control Systems Security
ISO15408CC_ارزیابی امنیتی محصول.pdf
Data Fusion model for web analytics
ِData Fusion and Log correlation tools & case-studies
ارزیابی امنیتی محصولات فناوری اطلاعات(ISO 15408 Common Criteria)- بخش دوم
شناسایی بات نت های Fast-Flux با استفاده از تحلیل ناهنجاری DNS
ISO/IEC 15408-Common Criteria: ارزیابی امنیتی محصولات فناوری اطلاعات
امنیت سامانه های کنترل صنعتی واسکادا-Industrial Control Systems Security

Recently uploaded (20)

PPTX
Computer network topology notes for revision
PDF
Mega Projects Data Mega Projects Data
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Lecture1 pattern recognition............
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
annual-report-2024-2025 original latest.
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Business Analytics and business intelligence.pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Foundation of Data Science unit number two notes
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
Computer network topology notes for revision
Mega Projects Data Mega Projects Data
ISS -ESG Data flows What is ESG and HowHow
climate analysis of Dhaka ,Banglades.pptx
Lecture1 pattern recognition............
Supervised vs unsupervised machine learning algorithms
annual-report-2024-2025 original latest.
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Business Analytics and business intelligence.pdf
Miokarditis (Inflamasi pada Otot Jantung)
.pdf is not working space design for the following data for the following dat...
Foundation of Data Science unit number two notes
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
IBA_Chapter_11_Slides_Final_Accessible.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
1_Introduction to advance data techniques.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu

data Fusion and log correlation

  • 1. INTRO TO DATA FUSION AND LOG CORRELATION FOR WEB ANALYTICS By: Mahdi SayyadOctober 14, 2017
  • 2. TOPICS Who we are? Our Work Process Data fusion Fusion and Integration Information Fusion Taxonomy DF Models and architectures Log and logging Log/Event Correlation Log Types and Sources Log Correlation Pre-requisites Models/Types of Correlating logs Correlation Methods and techniques Relation between Correlation and Fusion  Conclusion BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 3. WHO WE ARE Mahdi Sayyad MS in Computer Engineering ,CEH ,CCNA, ISMS Lead Auditor, Cybersecurity Researcher, Co-Founder of ICSGROUP. 6+ year experience in InfoSec and Cybersecurity analysis Mohammad H. Bazrafkan MS in Information Security, CEH, OSCP, Co- Founder ARTINERTEBAT. 6+ year experience in InfoSec and Cybersecurity analysis BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 4. WHY WE ARE HERE? To facilitate things  Research, develop and propose available techniques/solution(s) for Fusion and Correlating massive data-set(Logs) collected and feed to web ranking system BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 5. OUR RESEARCH PROCESS Requirements analysis • Problem Definition • Requirement Skills • Scope Determination • Team Selection and acquisition Information Gathering and Resource Selection • Papers(journals and Conferences) • Books(handbooks and proceeding studies) • Thesis • Technical reports • Workshop presentation • Main Keywords: Data Fusion/Log Correlation/web log Analytics Review and Study • Fundamentals and scope define • Models and Architectures • Tools and Techniques Feasibility Analysis and Proposed solutions • Leveraging tools and techniques for use case • Selection and Developing proposed solution Reporting • Phase 1 • Phase 2 BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 6. DATA FUSION BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 7. DATA FUSION: INTRODUCTION  Information fusion used in detection and classification tasks in different application domains, first seen in military and image processing applications  Lately, these mechanisms have been used in new applications such as intrusion detection and Denial of Service (DoS) detection and etc. Data Fusion requires combining expertise in 2 areas:  Sensors  Information integration BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 8. DATA FUSION • a.k.a : "data fusion", "information fusion", "combination of evidence“ • Data Fusion: data fusion is the name which has been given to a variety of interrelated problems which have arisen primarily in military applications. I. “Locate and identify many unknown objects of many different types on the basis of different kinds of evidence. This evidence is collected on an ongoing basis by many possibly re-allocable sensors having varying capabilities. Analyze the results in such a way as to supply local and over-all assessments of the significance of a scenario and to determine proper responses based on those assessments”. (Mathematics of Data Fusion-Springer (1997" II. Data fusion is the process of combing information from a number of different sources to provide a robust and complete description of an environment or process of interest. • In a nutshell, information fusion can be defined as the combination of n multiple sources to obtain improved information (cheaper, greater quality, or greater relevance). o Sensor Fusion: In this kind of fusion, evidence from two or more usually closely located sensors-e.g., pictures from imaging sensors-is combined in order to get information which is more precise than that obtainable from each separate piece of evidence. Fusion of this type can involve the combination of raw signals supplied by sensors (sometimes called "pre-detection fusion), or of elementary "features" or "segments" extracted from the signals by pre-processors. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 9. FUSION OR INTEGRATION: FAST VS. BATCH Multisensory Integration Information Fusion Sensor Fusion Data integration The relationship among Fusion Terms  Data Fusion More than Just Integration  Information Fusion diverges from Data Integration in a few key ways:  Data Reduction  Handling Streaming/Real-Time Data  Human Interfaces BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 10. FUSION OR INTEGRATION: FAST VS. BATCH Sensor Fusion: The combining of sensory data or data derived from sensory data such that the resulting information is in some sense better than would be possible when these sources were used individually. Multisensor Integration: The synergistic use of sensor data for the accomplishment of a task by a system. This differs from sensor fusion in that sensor fusion includes the actual combination of sensory information into one representational format. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 11. DATA FUSION APPLICATIONS  Robotics  Intelligent Transportation Systems  Precision Agriculture  Security, Improving Intrusion Detection  Data Privacy  Assessing and Monitoring Civil Infrastructures  Environmental Monitoring  Fire Detection  Financial analysis  Fault diagnosis  Medical Diagnoses …all data analytics and situation assessment issues can use fusion technology BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 12. CLASSIFICATION INFORMATION FUSION  Relationships among the input data sources may be used to segregate information fusion into classes (e.g., complementary, redundant and Cooperative data).  Abstraction level of the input/output data types(and their Nature) of a fusion process.(proposed by Dasarathy).  Abstraction Levels of the manipulated data during the fusion process (abstraction level of the employed data) includes (raw measurement, signal, feature, decision) can be used to distinguish among fusion processes. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 13.  Complementary: When information provided by the sources represents different portions of a broader scene, information fusion can be applied to obtain a piece of information that is more complete (broader).  Redundant: If two or more independent sources provide the same piece of information, these pieces can be fused to increase the associated confidence.  Cooperative: Two independent sources are cooperative when the information provided by them is fused into new information (usually more complex than the original data) that, from the application perspective, better represents the reality. CLASSIFICATION BASED ON RELATIONS OF DATA SOURCES BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 14. CLASSIFICATION BASED ON RELATIONSHIP AMONG DATA SOURCES BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 15. CLASSIFICATION BASED ON LEVELS OF ABSTRACTION Use abstraction level of the employed data: pixel, signal, feature, and symbol. Signal : Deals with single or multidimensional signals from sensors. It can be used in real-time applications or as an intermediate step for further fusions. Pixel : Operates on images and can be used to enhance image-processing tasks. Feature : Deals with features or attributes extracted from signals or images, such as shape and speed. Symbol : Information is a symbol that represents a decision, and it is also referred to as decision level. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 16. CLASSIFICATION BASED ON LEVELS OF ABSTRACTION According to the abstraction level of the manipulated data, IF can be classified into four categories:  Low-Level Fusion: Also referred to as data/signal/measurement level fusion. Raw data are provided as inputs, combined into new piece of data that is more accurate (reduced noise) than the individual inputs. Ex. Polastre et al. [2004] provide an example of low-level fusion by applying a moving average filter to estimate ambient noise and determine whether or not the communication channel is clear.  Medium-Level Fusion: Attributes or features of an entity (e.g., shape, texture, position) are fused to obtain a feature map that may be used for other tasks (e.g., segmentation or detection of an object). This type of fusion is also known as feature/attribute level fusion.  High-Level Fusion: Also known as symbol or decision level fusion. It takes decisions or symbolic representations as input and combines them to obtain a more confident and/or a global decision.  Multilevel Fusion: When the fusion process encompasses data of different abstraction levels— when both input and output of fusion can be of any level (e.g., a measurement is fused with a feature to provide a decision)—multilevel fusion takes place. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 17. CLASSIFICATION BASED ON INPUT AND OUTPUT  Data In–Data Out (DAI-DAO) : In this class, information fusion deals with raw data and the result is also raw data, possibly more accurate or reliable.  Data In–Feature Out (DAI-FEO) : Information fusion uses raw data from sources to extract features or attributes that describe an entity. Here, “entity” means any object, situation, or world abstraction.  Feature In–Feature Out (FEI-FEO) : FEI-FEO fusion works on a set of features to improve/refine a feature, or extract new ones.  Feature In–Decision Out (FEI-DEO) : In this class, information fusion takes a set of features of an entity generating a symbolic representation or a decision.  Decision In–Decision Out (DEI-DEO) : Decisions can be fused in order to obtain new decisions or give emphasis on previous ones. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 18. CLASSIFICATION BASED ON INPUT AND OUTPUT BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 19. DATA FUSION MODELS Models and architectures proposed to design information fusion systems can be centered on the abstraction of the data generated during fusion. 1) Joint Directors of Laboratories Model (JDL) 2) Modified Waterfall Fusion Model (WFDF) 3) The Intelligence Cycle-Based Model 4) Boyd Model 5) Omnibus Model 6) Dasarathy Model BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 20. JOINT DIRECTORS OF LABORATORIES MODEL(JDL)  JDL is a popular model in the fusion research community.  The model is composed of five processing levels, an associated database, and an information bus connecting all components.  Drawback of the JDL model is that it does not make explicit the interaction among the processing elements. Moreover, it suppresses any feedback: it does not specify how current or past results of fusion can be used to enhance future iterations. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 21. JDL Model BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 22. WATERFALL DATA FUSION PROCESS (WFDF) MODEL  waterfall data fusion process (WFDF) model, a sort of hierarchical architecture, concentrates on the processing functions on the lower levels. These functions have some similarities with the JDL model: I. Sensing/signal processing relates to source pre-processing II. Feature extraction/pattern processing corresponds to object refinement (OR), III. Situation assessment (SA) is connected to situation refinement (SR) and IV. Decision making relates to threat refinement (TR).  A modified WFDF model as proposed with increasing sub-task complexity. This modified model (MWFDF) is more an action-oriented model as it is augmented by local feedback loops BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 23. MWFDF BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 24. BOYD MODEL  The Boyd model depicts the classical decision-support mechanism in information operations with a closed/feedback loop. As this Boyd control loop (BCL) model has a cycle of observe–orient–decide–act (OODA) features, an appropriate combination of the IC and BCL loop models would yield a better fusion process model. This combined model would be a less abstract and more action-oriented model, and one such combination results in the Omnibus (OB) mode. The Boyd-OODA cyclic loop model BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 25. OMNIBUS MODEL  The OB model combines several beneficial features of other approaches. Interestingly this OB model seems to be a hybrid of the three models :(i) BCL model, (ii) Dasarathy model and (iii) Waterfall model. The model: a) specifies the ordering of processes b) makes the cyclic nature more explicit c) uses a general terminology  A few WFDF aspects are seen in the OB model, and it is more generalized than many other models. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 26. DASARATHY OR DFD (DATA-FEATURE-DECISION) OR FODF(FUNCTIONALITY ORIENTED) MODEL.  The Dasarathy or DFD (Data-Feature-Decision) model is a fine-grained information- centered model in which the elements of information fusion are specified based on their inputs and outputs.  model is based on the fusion functions rather than the tasks. The DFD model: The primary input is raw data and the main output is a decision. The components responsible for the several fusion stages are the elements DAI- DAO,DAI-FEO, FEI-FEO, FEI-DEO and DEI-DEO.  In specifying the main types of fusion regarding their input and output data. For this reason it is also used to classify information fusion. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 27. DASARATHY MODEL BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 28. DASARATHY’S DATA-DIRECTED VS TASK- DIRECTED FUSION Dasarathy’s functional DF model Modified multi-sensory DF model Task-directed Data-directed BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 29. DATA FUSION ARCHITECTURES Central fusion: refers to the process by which a suite of sensors supplies their observations directly to a central fusion processing site over data links. Distributed fusion: (sometimes also called track-to-track fusion refers to the process of fusing together both observations and the target estimates supplied by remote fusion sources-even though these sites may (and usually do) share data collection resources. Distributed fusion of information strongly depends on the particular topology of the communications network which interconnects the fusion sites. he architecture is very useful for large flexible/smart structures, aircraft/spacecraft health monitoring, large automation/chemical plants, large sensor NWs and chemical industrial applications. Hybrid Fusion: This architecture involves both centralized and distributed DF schemes based on the disposition of the required sensor configurations. This combination scheme is used for some applications, where certain arrangement of sensors is already pre-defined or is already in operation, and a variety of similar, dissimilar sensors/measuring systems (like radars) are used. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 30. (a) centralized processing, (b) decentralized processing, (c) hierarchical. (Adapted from Raol, J. R. Multi-Sensor Data Fusion with MATLAB. CRC Press, FL, 2010.) BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 31. LOG CORRELATION BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 32. EVENT AND LOG  Event – a change in the system state, e.g., a disk failure; when a system component (application, network device, etc.) encounters an event, it could emit an event message that describes the event.  Log: I. a full written record of a journey, a period of time, or an event(Cambridge Dictionary). II. in a computing context, A log is the automatically produced and time-stamped documentation of events relevant to a particular system. Virtually all software applications and systems produce log files.(whatis.techtarget.com) BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 33. EVENT LOGGING Log Event Alert Incident  Event logging – a procedure of storing event messages to a local or remote (usually flat-file) event log.  Event logs play an important role in modern IT systems: o many system components like applications, servers, and network devices have a built-in support for event logging (with the BSD syslog protocol being a widely accepted standard), o since in most cases event messages are appended to event logs in real-time, event logs are an excellent source of information for monitoring the system (a number of tools like Swatch and Logsurfer have been developed for log monitoring) o information that is stored to event logs can be useful for analysis at a later time, e.g., for audit procedures. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 34. EVENT SOURCES AND TYPES Types of Logs: Audit Logs Transaction Logs Intrusion Logs Connection Logs System Performance Activity Logs Application specific logs OS Specific LogsWhere to find logs: • Servers • Firewalls • Databases • Routers • Point of Sale Devices • USB Drives • Kiosks • Etc BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 35. KEY ITEMS IN LOGGING Log Transport Log Syntax and Format Log Taxonomy and components Log Settings and Configuration BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 36. LOG FUNDAMENTAL REQUISITES Log files have certain fundamental requisites. They are: • Integrity: The log must be unaltered and totally free of any tampering or modification by unauthorized operators. • Time Stamping: The log must ensure a reasonable certainty as to exactly when a certain event was registered. This is absolutely essential for making post-incident correlations. • Normalization and Data reduction: Normalization refers to the extraction of a datum from the original format of the log file without altering its integrity. This datum can then be correlated with others of a different type. Data reduction (a.k.a. filtering) is a procedure for identifying pertinent events and correlating them according to selective criteria. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 37. CORRELATION Cor·re·la·tion n. 1. A causal, complementary, parallel, or reciprocal relationship, especially a structural, functional, or qualitative correspondence between two comparable entities: a correlation between drug abuse and crime. 2. Statistics. The simultaneous change in value of two numerically valued random variables: the positive correlation between cigarette smoking and the incidence of lung cancer; the negative correlation between age and normal vision. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 38. LOG/EVENT CORRELATION Event correlation – a conceptual interpretation procedure where new meaning is assigned to a set of events that happen within a predefined time interval [Jakobson and Weissman, 1995]. During the event correlation process, new events might be inserted into the event stream and original events might be removed. Examples:  if 10 login failure events occur for a user within 5 minutes, generate a security attack event;  if both device internal temperature too high and device not responding events have been observed within 5 seconds, replace them with the event device down due to overheating. A number of approaches have been proposed for event correlation (rule-based, codebook based, neural network based etc. methods), and a number of event correlation products are available on the market (HP ECS, SMARTS, NerveCenter, RuleCore, LOGEC, etc.) BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 39. EVENT LOG MONITORING Dec 18 08:47:26 myhost [daemon.info] sshd[15172]: log: Connection from 10.2.211.19 port 1304 Dec 18 08:47:39 myhost [daemon.info] sshd[15172]: log: Password authentication for alex accepted. Dec 18 08:50:09 myhost [kern.crit] vmunix: /var/tmp: file system full Dec 18 08:50:10 myhost [mail.debug] imapd[15399]: imap service init from 10.2.213.2 Dec 18 08:50:10 myhost [mail.info] imapd[15399]: Login user=risto host=risto2 [10.2.213.2]  Commonly used log monitoring tools – Swatch, Logsurfer, etc.  Current log monitoring practice – match logged lines in real time with regular expressions and perform an action (e.g., send an alert) when a matching line is observed.  Open issues – in order to write rules (regexp → action) for log monitoring tools, one must have a good knowledge about the IT system and log messages; existing tools don’t support event correlation well. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 40. CORRELATION PRE-REQUISITES 1. Data Transport – Moving Log Data from point A to point B. Security of the data is important at this level, by reducing risk exposure by encrypting and authenticating the data, data transport can be accomplished with high certainty. 2. Data Normalization – Taking the data from disparate log formats and creating a flexible single log format. 3. Aggregation – Taking Data from multiple B level devices and consolidating the data into one system. 4. Data Reduction – Reducing the data through deletion of duplicates, combining similar events into a single filtering, or using summary information. Compressing the data also is helpful to reduce bandwidth usage. 5. Correlation Engine – The process of correlating field variables into a unique consolidated event. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 41. MODELS/TYPES OF CORRELATION  Micro Level Correlation – Comparing fields within one data set. Most of the vendors today are performing this type of correlation. Micro correlation is typically done with just the fields in the normalized data. Otherwise known as atomic correlations.  Macro Level Correlation – Comparing multiple data sets. Macro level correlation is concerned with pulling in other sources of information in order to further validate or gain intelligence on your event stream. This technique is sometimes referred to as fusion correlation. One example of macro-correlation is taking vulnerability scan data into account and comparing it inline with event data coming into your analysis system. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 42. MICRO CORRELATION Field Correlation - The ability to correlate specific events to a single field or multiple fields in the normalized data. Can be as simple as a search for all port 80 attacks. Rule/Pattern Correlation – The ability via specific rules to label a certain set of events as a correlated event. Often used to correlate events over a long period of time. If the system sees an event E1 where E1.eventType=portscan followed by an event E2 where E2.srcip=E1.srcip and E2.dstip=E1.dstip and E2.eventType=fw.reject then doSomething BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 43. MACRO CORRELATION Vulnerability Correlation – The process of mapping event IDS event names to vulnerability names this allows for reduction in false positives. This is particular useful with a good tuned vulnerability scanner and IDS. Profile (Finger Print) Correlation - Forensic network data such as remote port scans, remote OS finger prints, finger information, and banner snatching provides a series of data sets that can be compared to help correlate attacks to attacker profiles. Watch List Correlation – Using a set of learned inputs the watch list can serve as a reminder of previous offenders in real-time. Correlating previous attackers with current attacks. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 44. CORRELATION METHODS AND TECHNIQUES A number of techniques have been proposed for event correlation rule-based Graph based codebook based neural network based two basic approaches to log analysis Top-Down Approach Bottom-Up Approach BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 45. CORRELATION METHODS AND TECHNIQUES  Rule-based (HP ECS, IMPACT, RuleCore, etc.) – events are correlated according to the rules condition → action that are specified by the human analyst.  Codebook based (SMARTS) – if a set of events e1,...,ek must be interpreted as event A, then e1,...,ek are stored to the codebook as a bit-vector pointing to A. In order to correlate a set of events, look for the most closely matching vector in the codebook, and report the interpretation that corresponds to the vector.  Graph based – find all dependencies between system components (network devices, hosts, services, etc.) and construct a graph with each node representing a system component and each edge a dependency between two components. When a set of fault events occurs, use the graph for finding possible root cause(s) of fault events (e.g., 10 “HTTP server not responding” events were caused by the failure of a single network link).  Neural network based – a neural net is trained for the detection of anomalies in the event stream, root cause(s) of fault events, etc. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 46. CORRELATION AND FUSION  Data fusion is a multilevel, multifaceted process dealing with the registration, detection, association, correlation, and combination of data and information from multiple sources to achieve refined state and identity estimation, and complete and timely assessments of situation (including threats and opportunities). (according to the standard definition developed by the Department of Defense [DoD] Joint Directors of Laboratories). BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 47. CORRELATION IN FUSION PROCESS (LOW-LEVEL) DF Process Registration Detection Correlation and Association Combination State and Identity Estimation Assessments of Situation and Threats • Fusion with no correlation • Fusion with correlation • There is also, correlation can be done in a separate process (when there is no concern about abstract levels of data) or part of IF. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 48. CORRELATION IN LOGGING AND LOG MANAGEMENT Log Analysis Process raw log data Filter normalization Correlation Action Raw Log data Filter Correlation Normalized Log Exceptions Long-time Storage To analysis Alerts Emails To long-Time Storage BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 49. WHAT TO DO WITH CORRELATED DATA? Examples of what to do with Correlated Data  Threat Analysis – This is a process of taking the correlated data and determining what it means and then prioritizing the data. Includes the reduction of False Positives.  Forensic Analysis – The ability to analyze correlated events historically for trending and for prosecution. Situation/Policy Analysis – Correlation provides a holistic view, providing a complete picture of your security posture. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 50. FUSION AND CORRELATION IN WEB ANALYTICS PROCESS Implement changes Analyze data Collect data Build KPIs Define goals Web analytics steps: Repeat Building a scalable ArchitectureBY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 51. CONCLUSION: BRING ALL TOGETHER Resource Management Sensor Fusion Data Fusion Information Fusion Tracking Data Mining EstimationCorrelation Sensor Management Adapted from: Data Fusion For Delivering Advanced Traveler Information Services, 2003. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 52. CONCLUSION Which one ?  Fusion  integration issues in different abstraction level, very large, distributed, streaming data set, getting better confident and precise data for situation assessment.  Correlation  can done at many steps of data fusion/integration for defined target scope, specifically useful when there is a pre-defined target/category. Finding new relation to pre-determined variable.  Log Analysis  for our goal since we have log/event data, log analysis is essential, so correlation is a choice. BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 53. WHERE WE GO NOW Literature review and case study for weblog correlation Tools and Solution for web log analysis(based on goals of projects Propose the methods and solution for web ranking project BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM
  • 54. THE END Ay Questions? Give me feedback Email: msayyad@protonmail.com Telegram ID: Hex0r Thank for your attention BY MAHDI SAYYAD | 2017 | EMAIL: MSAYYAD@PROTONMAIL.COM