SlideShare a Scribd company logo
Malware Detection using n-grams
and Evaluation using Machine
Learning Algorithms
11MSE0195-SHERIN JOSEPHIN B
Abstract
‱ Computer security has been a major concern in today's
scenario. The term Malware is used to denote bad
software which hacks the computer security in the
present world.
‱ While most of the anti-virus software fails to detects
new virus. Thus n-grams as file signature can help us to
detect own malware and reduce false positive ratio.
‱ Further the dataset is optimized by using feature
selection algorithm. The final Featured Vector Table
obtained from feature selection and dimension
reduction will be compared and evaluated using
various machine learning algorithms.
Aim and Scope
‱ The aim of this project is to detect malware files using n-
gram analysis and evaluate it using machine learning
algorithm.
‱ As many antivirus software fails to detect new virus, using
n-gram as a model, will detect malware files efficiently.
‱ This project will focus on developing a better tool to detect
the malware files taking into consideration space
complexity.
‱ It is currently used in industries. Every industry mainly
focuses on securing the data. Anti-virus software like
Kaspersky, K7 uses this technique to detect malware files.
LITERATURE SURVEY
LITERATURE SURVEY...
LITERATURE SURVEY...
S.N TITLE ABSTRACT TECHNIQUES ADVANTAGES
8. “Static Malware Detection
with Segmented
Sandboxing”
This is study is about Taking
the best part from both
static and dynamic detection
approach, which is called
“Segmented Sandboxing” is
applied to detect malware
files.
1. segmented
sandboxing
Higher detection rate
(compare previous data)
9. .,“N grams based file
signature for malware
detection”.
This study proposes the use
of n-grams as file signatures
in order to detect unknown
malware
1.n-grams low false positive ratio.
10. “A Hybrid Model to
Detect Malicious
Executables”.
This paper proposes
featuthe re set is called
hybrid feature set which is
given to support vector
machine which classify
malware and benign files.
1.n-grams
2.SVM
1.high accuracy
2. low false positive rate
11. Detection of New
Malicious Code Using N-
grams Signatures”.
This paper says about the n-
gram analysis that classify
the malware and benign .
1.n-grams 1. efficient
2. Scalable
3. practical solutions
Architecture
Detailed Design
Module Description
MODULE 1: Dataset preparation
-executable files (benign or malware file) are disassembled using a
disassembler.
-assembly code is parsed. The opcode sequence is collected in Dataset.
MODULE 2 : Create Feature Vector Table( FVT )by n-grams extraction
- Dataset is classified as Training data and Testing data.
- The training data is used for n-gram extraction.
- These extracted n-grams are stored in a table called Feature Vector Table
(FVT).
- Feature Vector Table consists opcode, its frequency count and respective
class
MODULE 3 : Employing Feature Reduction Algorithm
- PCA
MODULE 4: Classification using Machine Learning Algorithm
- J48,Support Vector Machine(SVM) and Random Forest
UML Design
‱USE CASE DIAGRAM
‱CLASS DIAGRAM
‱SEQUENCE DIAGRAM
‱ACTIVITY DIAGRAM
‱STATE CHART DIAGRAM
USE CASE DIAGRAM
CLASS DIAGRAM
Sequence Diagram
Activity Diagram
State Traction
Results and Discussion
With PCA Without PCA
2 grams 8 216
3 grams 9 256
4 grams 8 256
With Feature Selection Algorithm
2-grams Random Forest SVM J48
Classified 95% 82.50% 88%
Misclassified 12.30% 82.50% 36.40%
Precision 95.00% 68.10% 86.90%
Performance Table for 2grams
Graphic view for 2grams
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Random Forest SVM J48
TPR
FPR
Precision
Performance Table for 3grams
3-grams Random Forest SVM J48
Classified 92% 94.70% 84%
Misclassified 52.10% 34.70% 53.20%
Precision 92.80% 95.00% 84.20%
Graphic view for 3grams
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Random Forest SVM J48
TPR
FPR
Precision
Normal Code and Obfuscated Code
Disassembling the executables
Parser
N-grams extraction
Opcode and its frequency and class
Data set
Before PCA
After Feature Selection- PCA
Classification
Today

More Related Content

PPTX
A review of machine learning based anomaly detection
PPTX
Machine learning in computer security
PPTX
Anomaly detection, part 1
PDF
ieee project topic & abstracts in php
PPTX
Spam detection using machine learning based binary classifier_043660
PPTX
Subverting Machine Learning Detections for fun and profit
ODP
Malware Dectection Using Machine learning
PPTX
Anomaly Detection for Security
A review of machine learning based anomaly detection
Machine learning in computer security
Anomaly detection, part 1
ieee project topic & abstracts in php
Spam detection using machine learning based binary classifier_043660
Subverting Machine Learning Detections for fun and profit
Malware Dectection Using Machine learning
Anomaly Detection for Security

What's hot (20)

PPSX
04 intel v_tune_session_05
DOCX
A malware detection method for health sensor data based on machine learning
PPTX
Malware Detection Using Machine Learning Techniques
DOCX
IEEE 2014 JAVA PARALLEL DISTRIBUTED PROJECTS On false-data-injection-attacks-...
PDF
IRJET - Survey on Malware Detection using Deep Learning Methods
PPTX
Infiltrate 2015 - Data Driven Offense
PDF
Artificial immune system against viral attack
PPTX
Outlier detection for high dimensional data
PPTX
Cyber intrusion
PDF
Automatically generated win32 heuristic
PPTX
Penetration testing
PDF
Quick presentation for the OpenML workshop in Eindhoven 2014
PPTX
fault localization in computer network..
DOCX
robust malware detection for iot devices using deep eigen space learning
DOCX
Security evaluation of pattern classifiers under attack
PDF
Design and Implementation of Artificial Immune System for Detecting Flooding ...
PPTX
DM for IDS
DOC
Security evaluation of pattern classifiers under attack
PDF
AI approach to malware similarity analysis: Maping the malware genome with a...
PPTX
Malware Detection Using Data Mining Techniques
04 intel v_tune_session_05
A malware detection method for health sensor data based on machine learning
Malware Detection Using Machine Learning Techniques
IEEE 2014 JAVA PARALLEL DISTRIBUTED PROJECTS On false-data-injection-attacks-...
IRJET - Survey on Malware Detection using Deep Learning Methods
Infiltrate 2015 - Data Driven Offense
Artificial immune system against viral attack
Outlier detection for high dimensional data
Cyber intrusion
Automatically generated win32 heuristic
Penetration testing
Quick presentation for the OpenML workshop in Eindhoven 2014
fault localization in computer network..
robust malware detection for iot devices using deep eigen space learning
Security evaluation of pattern classifiers under attack
Design and Implementation of Artificial Immune System for Detecting Flooding ...
DM for IDS
Security evaluation of pattern classifiers under attack
AI approach to malware similarity analysis: Maping the malware genome with a...
Malware Detection Using Data Mining Techniques
Ad

Viewers also liked (9)

PPTX
An Introduction to Malware Classification
PPT
NIDS ppt
PPTX
Antivirus - Virus detection and removal methods
DOCX
Discovery and verification Documentation
PDF
Malware classification and detection
PDF
60780174 49594067-cs1403-case-tools-lab-manual
PPT
Malware Detection using Machine Learning
DOCX
Fr app e detecting malicious facebook applications
PPTX
Computer Virus powerpoint presentation
An Introduction to Malware Classification
NIDS ppt
Antivirus - Virus detection and removal methods
Discovery and verification Documentation
Malware classification and detection
60780174 49594067-cs1403-case-tools-lab-manual
Malware Detection using Machine Learning
Fr app e detecting malicious facebook applications
Computer Virus powerpoint presentation
Ad

Similar to Today (20)

PDF
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
 
PDF
A study of detecting computer viruses in real infected files in the n-gram re...
PDF
Automated malware invariant generation
PDF
A hybrid model to detect malicious executables
PDF
Zero day malware detection
PPT
CISC 879 - Machine Learning for Solving Systems Problems
 
PPT
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
PPTX
PPTX
Keith J. Jones, Ph.D. - MALGAZER: AN AUTOMATED MALWARE CLASSIFIER WITH RUNNIN...
PDF
Design and Development of an Efficient Malware Detection Using ML
PPTX
malware detection ppt for vtu project and other final year project
PDF
Malwise-Malware Classification and Variant Extraction
PPTX
Malware Detector
PDF
A novel ensemble-based approach for Windows malware detection
PPTX
Malware Classification and Analysis
PDF
Classification of Malware based on Data Mining Approach
PPT
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
PPT
Malware Classification Using Structured Control Flow
PDF
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
PDF
Inbot10 vxclass
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
 
A study of detecting computer viruses in real infected files in the n-gram re...
Automated malware invariant generation
A hybrid model to detect malicious executables
Zero day malware detection
CISC 879 - Machine Learning for Solving Systems Problems
 
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
Keith J. Jones, Ph.D. - MALGAZER: AN AUTOMATED MALWARE CLASSIFIER WITH RUNNIN...
Design and Development of an Efficient Malware Detection Using ML
malware detection ppt for vtu project and other final year project
Malwise-Malware Classification and Variant Extraction
Malware Detector
A novel ensemble-based approach for Windows malware detection
Malware Classification and Analysis
Classification of Malware based on Data Mining Approach
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
Malware Classification Using Structured Control Flow
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
Inbot10 vxclass

Recently uploaded (20)

PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Understanding Forklifts - TECH EHS Solution
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Transform Your Business with a Software ERP System
PDF
System and Network Administration Chapter 2
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
System and Network Administraation Chapter 3
PPTX
ai tools demonstartion for schools and inter college
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Digital Strategies for Manufacturing Companies
How to Migrate SBCGlobal Email to Yahoo Easily
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Operating system designcfffgfgggggggvggggggggg
Wondershare Filmora 15 Crack With Activation Key [2025
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Online Work Permit System for Fast Permit Processing
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Understanding Forklifts - TECH EHS Solution
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
ISO 45001 Occupational Health and Safety Management System
2025 Textile ERP Trends: SAP, Odoo & Oracle
Transform Your Business with a Software ERP System
System and Network Administration Chapter 2
Which alternative to Crystal Reports is best for small or large businesses.pdf
System and Network Administraation Chapter 3
ai tools demonstartion for schools and inter college
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Odoo POS Development Services by CandidRoot Solutions
Digital Strategies for Manufacturing Companies

Today

  • 1. Malware Detection using n-grams and Evaluation using Machine Learning Algorithms 11MSE0195-SHERIN JOSEPHIN B
  • 2. Abstract ‱ Computer security has been a major concern in today's scenario. The term Malware is used to denote bad software which hacks the computer security in the present world. ‱ While most of the anti-virus software fails to detects new virus. Thus n-grams as file signature can help us to detect own malware and reduce false positive ratio. ‱ Further the dataset is optimized by using feature selection algorithm. The final Featured Vector Table obtained from feature selection and dimension reduction will be compared and evaluated using various machine learning algorithms.
  • 3. Aim and Scope ‱ The aim of this project is to detect malware files using n- gram analysis and evaluate it using machine learning algorithm. ‱ As many antivirus software fails to detect new virus, using n-gram as a model, will detect malware files efficiently. ‱ This project will focus on developing a better tool to detect the malware files taking into consideration space complexity. ‱ It is currently used in industries. Every industry mainly focuses on securing the data. Anti-virus software like Kaspersky, K7 uses this technique to detect malware files.
  • 6. LITERATURE SURVEY... S.N TITLE ABSTRACT TECHNIQUES ADVANTAGES 8. “Static Malware Detection with Segmented Sandboxing” This is study is about Taking the best part from both static and dynamic detection approach, which is called “Segmented Sandboxing” is applied to detect malware files. 1. segmented sandboxing Higher detection rate (compare previous data) 9. .,“N grams based file signature for malware detection”. This study proposes the use of n-grams as file signatures in order to detect unknown malware 1.n-grams low false positive ratio. 10. “A Hybrid Model to Detect Malicious Executables”. This paper proposes featuthe re set is called hybrid feature set which is given to support vector machine which classify malware and benign files. 1.n-grams 2.SVM 1.high accuracy 2. low false positive rate 11. Detection of New Malicious Code Using N- grams Signatures”. This paper says about the n- gram analysis that classify the malware and benign . 1.n-grams 1. efficient 2. Scalable 3. practical solutions
  • 9. Module Description MODULE 1: Dataset preparation -executable files (benign or malware file) are disassembled using a disassembler. -assembly code is parsed. The opcode sequence is collected in Dataset. MODULE 2 : Create Feature Vector Table( FVT )by n-grams extraction - Dataset is classified as Training data and Testing data. - The training data is used for n-gram extraction. - These extracted n-grams are stored in a table called Feature Vector Table (FVT). - Feature Vector Table consists opcode, its frequency count and respective class MODULE 3 : Employing Feature Reduction Algorithm - PCA MODULE 4: Classification using Machine Learning Algorithm - J48,Support Vector Machine(SVM) and Random Forest
  • 10. UML Design ‱USE CASE DIAGRAM ‱CLASS DIAGRAM ‱SEQUENCE DIAGRAM ‱ACTIVITY DIAGRAM ‱STATE CHART DIAGRAM
  • 16. Results and Discussion With PCA Without PCA 2 grams 8 216 3 grams 9 256 4 grams 8 256 With Feature Selection Algorithm
  • 17. 2-grams Random Forest SVM J48 Classified 95% 82.50% 88% Misclassified 12.30% 82.50% 36.40% Precision 95.00% 68.10% 86.90% Performance Table for 2grams
  • 18. Graphic view for 2grams 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Random Forest SVM J48 TPR FPR Precision
  • 19. Performance Table for 3grams 3-grams Random Forest SVM J48 Classified 92% 94.70% 84% Misclassified 52.10% 34.70% 53.20% Precision 92.80% 95.00% 84.20%
  • 20. Graphic view for 3grams 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Random Forest SVM J48 TPR FPR Precision
  • 21. Normal Code and Obfuscated Code
  • 25. Opcode and its frequency and class