SlideShare a Scribd company logo
By Ahmad Etezadi &
Matin Zivdar
Log Anomaly
Detection
● Anomaly detection
● Server Logs Dataset
● Pre Processing
● Models
● Evaluation
● API
Summary
Anomaly detection
Anomaly detection is one of the most popular machine learning techniques.
In this project, we are asked to identify abnormal behaviors in a system, which relies on the
analysis of logs collected in real-time from the log aggregation systems of an enterprise.
Sample Log
IP [TIME] [Method Path] StatusCode ResponseLength [[UserAgent]] ResponseTime
Pre Processing
Feature
Extraction
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis sit
amet odio vel purus bibendum luctus.
Generation of features from data that are in a
format that is difficult to analyse directly.
Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Duis sit amet odio vel purus
bibendum luctus.
Feature
Transformation
Transformation of Data:
Encoding, Normalization, etc.
Extraction
tehran_traffic_statistics
is_spider
Pre Processing
url_depth is_phone
Extraction
URL
Depth
Iran Network Traffic Statistics
OS Family Device Brand
http://members.tehran-ix.ir/statistics/ixp/pkts
Transformation
One-hot
encoding
Normalization
time_weight
status_code
Pre Processing
method
Bucketing
response_length requested_file_type
Transformation
Bucketing Normalization One hot encoding
● Learning Algorithm
Unsupervised Learning
02
● K-means
● PCA
● Isolation Forest
● AutoEncoder
Supervised Learning
01
● k-NN
● Local Outlier Factor (LOF)
● SVM
● Neural Networks Based
Anomaly
Detection
Algorithm
PCA is an unsupervised machine learning algorithm
that attempts to reduce the dimensionality.
Using PCA, you can reduce the dimensionality data
and reconstruct the it.
Since anomaly show the largest reconstruction error,
abnormalities can be found based on the error
between the original data and the reconstructed data.
PCA
Anomaly Samples
Conclusion
Isolation forest works on the principle of the
decision tree algorithm.
Due to the fact that anomalies require fewer
random partitions than normal normal data
points in the data set. So anomalies are points
that have a shorter path in the tree.
Isolation Forest
Anomaly Samples
Conclusion
Comparison
Isolation
Forest
vs.
PCA
Number of Isolation Forest outlier: 124,091
Number of PCA outlier: 123,695
common data in PCA & Isolation Forest outlier: 20,512
Let's Dig Deeper
An autoencoder is a special type of neural
network that copies the input values to the
output values.
It does not require the target variable like the
conventional Y, thus it is categorized as
unsupervised learning.
Autoencoder
Sample of error reconstruction
Histogram of error by Autoencoder
Evaluation Models
Autoencoder PCA Isolation Forest
Follow the Rules!
- User agents
- Malicious IPs dataset
Model ROC Curve
Web Crawler Detection Model
Web Crawler Detection Model
API Example
We also tried:
- Extracting time series features
- Labeling all other requests for an
anomaly “IP” and “User Agent” as an
anomaly request
Thank You!

More Related Content

PPTX
Reverse Engineering: The Crash Course
PPTX
JavaMicroBenchmarkpptm
PPT
Instrumentation and measurement
PDF
New hybrid ensemble method for anomaly detection in data science
PDF
Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...
PPTX
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
PPTX
Anomaly Detection Using the CLA
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Reverse Engineering: The Crash Course
JavaMicroBenchmarkpptm
Instrumentation and measurement
New hybrid ensemble method for anomaly detection in data science
Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection Using the CLA
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning

Similar to Web Crawler Detection Model (20)

PDF
POSTER_Ewonye.pdf
PDF
Adam Ashenfelter - Finding the Oddballs
PDF
Introduction to anomaly detection methods
PPTX
Anomaly detection workshop
PDF
Isolation Forest
PDF
Anomaly detection (Unsupervised Learning) in Machine Learning
PDF
An Introduction to Anomaly Detection
PPTX
Anomaly Detection Via PCA
PDF
A combined approach for anomaly detection in production systems using ML tech...
PDF
EMT machine learning 12th weeks : Anomaly detection
PPTX
Anomaly Detection in Time-Series Data using the Elastic Stack by Henry Pak
PPTX
Anomalies! You can't escape them.
PDF
Influx/Days 2017 San Francisco | Baron Schwartz
PPTX
Splunk live! Customer Presentation – Prelert
PDF
Scalable Infrastructure and Workflow for Anomaly Detection
PPTX
A review of machine learning based anomaly detection
PPTX
A review of machine learning based anomaly detection
PPTX
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
PDF
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
PPTX
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
POSTER_Ewonye.pdf
Adam Ashenfelter - Finding the Oddballs
Introduction to anomaly detection methods
Anomaly detection workshop
Isolation Forest
Anomaly detection (Unsupervised Learning) in Machine Learning
An Introduction to Anomaly Detection
Anomaly Detection Via PCA
A combined approach for anomaly detection in production systems using ML tech...
EMT machine learning 12th weeks : Anomaly detection
Anomaly Detection in Time-Series Data using the Elastic Stack by Henry Pak
Anomalies! You can't escape them.
Influx/Days 2017 San Francisco | Baron Schwartz
Splunk live! Customer Presentation – Prelert
Scalable Infrastructure and Workflow for Anomaly Detection
A review of machine learning based anomaly detection
A review of machine learning based anomaly detection
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
Ad

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
cuic standard and advanced reporting.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Approach and Philosophy of On baking technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Electronic commerce courselecture one. Pdf
PDF
KodekX | Application Modernization Development
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Network Security Unit 5.pdf for BCA BBA.
cuic standard and advanced reporting.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Understanding_Digital_Forensics_Presentation.pptx
Approach and Philosophy of On baking technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Unlocking AI with Model Context Protocol (MCP)
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Advanced methodologies resolving dimensionality complications for autism neur...
Electronic commerce courselecture one. Pdf
KodekX | Application Modernization Development
Review of recent advances in non-invasive hemoglobin estimation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Building Integrated photovoltaic BIPV_UPV.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Ad

Web Crawler Detection Model