Motivation Challenges & Alternatives Experiments Concluding Remarks
Malware Variants Identification in Practice
Marcus Botacin1, Andr´e Gr´egio1, Paulo L´ıcio de Geus2
1Federal University of Paran´a (UFPR) – {mfbotacin, gregio}@inf.ufpr.br
2University of Campinas (Unicamp) – paulo@lasca.ic.unicamp.br
SBSEG 2019
Malware Variants Identification in Practice 1 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Agenda
1 Motivation
Motivation
2 Challenges & Alternatives
Challenge 1
Challenge 2
3 Experiments
Evaluation
4 Concluding Remarks
Limitations
Conclusions
Malware Variants Identification in Practice 2 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Motivation
Agenda
1 Motivation
Motivation
2 Challenges & Alternatives
Challenge 1
Challenge 2
3 Experiments
Evaluation
4 Concluding Remarks
Limitations
Conclusions
Malware Variants Identification in Practice 3 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Motivation
Malware at Scale!
Figure: Source: https://www.infosecurity-magazine.com/news/
360k-new-malware-samples-every-day/
Figure: Source: https://money.cnn.com/2015/04/14/technology/
security/cyber-attack-hacks-security/index.html
Malware Variants Identification in Practice 4 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Motivation
Current Approaches.
Figure: Function-based, Graph Modeling.
Malware Variants Identification in Practice 5 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Challenge 1
Agenda
1 Motivation
Motivation
2 Challenges & Alternatives
Challenge 1
Challenge 2
3 Experiments
Evaluation
4 Concluding Remarks
Limitations
Conclusions
Malware Variants Identification in Practice 6 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Challenge 1
Same-Behavior Function Replacement
CreateProcess
CreateFile
RegCreateKey
NtSuspendThread
Figure: Original
sample’s CG.
CreateThread
CreateFileTransacted
RegLoadKey
RtlWow64SuspendThread
Figure: Variant sample’s
CG.
Modularity
File System Change
Registry Change
Application Interference
Figure: Behavioral
graph from both
samples.
Malware Variants Identification in Practice 7 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Challenge 1
Behavioral Classification
Compression
Cryptography.
Debug.
Delay.
Environment.
Escalation.
Exfiltration.
Fingerprint.
File System.
Interference.
Internet.
Modularity.
Monitoring.
Registry.
Evidence Removal.
Side Effects.
System Changes.
Target Information.
Timing Attacks.
Malware Variants Identification in Practice 8 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Challenge 1
Behavior-based Graph.
Figure: Behavior-based graph for a given sample.
Malware Variants Identification in Practice 9 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Challenge 2
Agenda
1 Motivation
Motivation
2 Challenges & Alternatives
Challenge 1
Challenge 2
3 Experiments
Evaluation
4 Concluding Remarks
Limitations
Conclusions
Malware Variants Identification in Practice 10 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Challenge 2
Malware Embedding
Figure: Sample 1. The original sample.
Figure: Sample 2. Variant sample embedding the original one.
Malware Variants Identification in Practice 11 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Challenge 2
Matching Metrics
Definition
The similarity of two malware, represented as sets, A and B, of
vertices or edges of two graphs, is defined as:
Sim(A, B) =
|A ∩ B|
|A ∪ B|
(1)
Definition
The similarity of two malware, represented as sets, A and B, of
vertices or edges of two graphs, is defined as:
Sim(A, B) = max
|A ∩ B|
|B|
,
|B ∩ A|
|A|
(2)
Malware Variants Identification in Practice 12 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Challenge 2
Continence Results
Table: Continence of
Sample 1 in Sample
2.
CG A B C
I 0.66 0.52 0.64
J 0.75 0.49 0.50
K 0.42 0.80 0.44
Table: Continence of
Sample 2 in Sample
1.
CG A B C
I 0.57 0.56 0.43
J 0.33 0.51 0.44
K 0.76 0.65 0.44
Table: Maximum
continence of Sample
1 and Sample 2.
CG A B C
I 0.66 0.56 0.64
J 0.75 0.51 0.50
K 0.76 0.80 0.44
Malware Variants Identification in Practice 13 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Evaluation
Agenda
1 Motivation
Motivation
2 Challenges & Alternatives
Challenge 1
Challenge 2
3 Experiments
Evaluation
4 Concluding Remarks
Limitations
Conclusions
Malware Variants Identification in Practice 14 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Evaluation
The Whitelisting Effect.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Mimail.a x Mimail.c Mimail.c x Mimail.e Mimail.f x Mimail.g
Evaluating the whitelist effect.
Without whitelist
With whitelist
Figure: Evaluating whitelisting effect. Similarity scores are higher when
using the whitelist-based approach.
Malware Variants Identification in Practice 15 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Evaluation
Advantages of the Behavioral model.
0
0.1
0.2
0.3
0.4
0.5
Mimail.a x Klez.a Mimail.c x Klez.a Mimail.k x Klez.g
Comparing function−based to the behavior−based approaches.
Function−based
Behavior−based
Figure: Function vs. Behavior-based approaches. Scores are higher when
considering behavioral patterns.
Malware Variants Identification in Practice 16 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Evaluation
Evaluating Metrics.
0
0.2
0.4
0.6
0.8
1
Mimail.a x Mimail.c Mimail.c x Mimail.e Mimail.f x Mimail.g
Evaluating the proposed metric.
Usual metric
Proposed metric
Figure: Proposed metric. Scores are higher when using it in comparison
to the usual one.
Malware Variants Identification in Practice 17 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Evaluation
Solutions Comparison.
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
120.0%
H x Q I x Q J x Q L x Q M x Q
Mimail’s similarity.
Solution #1
Solution #2
Our solution
Figure: Mimail’s sample similarity. Our solution’s scores are higher when
compared to other ones.
Malware Variants Identification in Practice 18 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Evaluation
Domain Transformation and Similarity Measures.
0
10
20
30
40
50
60
70
80
90
100
50% 60% 70% 80% 90%
Threshold evaluation
F.Mimail
F.Klez
F.Cross
B.Mimail
B.Klez
B.Cross
Figure: Threshold evaluation. This should be higher than 80% in order to
proper label the cross dataset.
Malware Variants Identification in Practice 19 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Evaluation
Real World Experiments (1/2)
Table: Identified variants among unknown, wild-collected samples.
Family Sample Hash Label
1
A c2ef1aabb15c979e932f5ea1d214cbeb Generic vb.OBY
B 747b9fe5819a76529abc161bb449b8eb Generic vb.OBO
C 39a04a11234d931bfa60d68ba8df9021 Generic vb.OBL
2
A 96d13246971e4368b9ed90c6f996a884 Atros4.CENI
B e23588078ba6a5f5ca1c961a8336ec08 Atros4.CENI
C 31a2b6adc781328cb1d77e5debb318ff Atros4.CENI
Malware Variants Identification in Practice 20 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Evaluation
Real World Experiments (2/2)
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
1.a x 1.b 1.a x 1.c 1.b x 1.c 2.a x 2.b 2.a x 2.c 2.b x 2.c
Study case − Identified variants
ssdeep
Our solution
Coverage
Figure: Study case: variant identification. Our approach outperforms
others even on low coverage scenarios.
Malware Variants Identification in Practice 21 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Limitations
Agenda
1 Motivation
Motivation
2 Challenges & Alternatives
Challenge 1
Challenge 2
3 Experiments
Evaluation
4 Concluding Remarks
Limitations
Conclusions
Malware Variants Identification in Practice 22 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Limitations
Matching Complex Behaviors is Challenging!
Figure: DLL injection functions among other function calls.
Figure: Proposed DLL injection class.
Malware Variants Identification in Practice 23 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Conclusions
Agenda
1 Motivation
Motivation
2 Challenges & Alternatives
Challenge 1
Challenge 2
3 Experiments
Evaluation
4 Concluding Remarks
Limitations
Conclusions
Malware Variants Identification in Practice 24 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Conclusions
Conclusions
Challenges & Lessons
Anti-disassembly breaks CG extraction.
Transparent, dynamic tracing is a viable alternative.
Same-Function Replacement breaks malware clustering.
Behavior-based clustering is a viable alternative.
Dead code breaks similarity metrics.
Continence metric is a viable alternative.
Malware Variants Identification in Practice 25 / 26 SBSeg’19
Motivation Challenges & Alternatives Experiments Concluding Remarks
Conclusions
Questions & Comments.
Contact
mfbotacin@inf.ufpr.br
Additional Material
https://github.com/marcusbotacin/Malware.Variants
Malware Variants Identification in Practice 26 / 26 SBSeg’19

More Related Content

PDF
Combining Model Checking and Spectrum-Based Fault Localization with Multiple...
PDF
Essentials of Marketing Research Putting Research Into Practice 1st Edition C...
PPTX
Modeling strategies for definitive screening designs using jmp and r
PDF
Near-memory & In-Memory Detection of Fileless Malware
PDF
Introduction to Behavioral Research Methods 7th Edition Leary Test Bank
PDF
GLMM in interventional study at Require 23, 20151219
PDF
Using Diversity for Automated Boundary Value Testing
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...
Combining Model Checking and Spectrum-Based Fault Localization with Multiple...
Essentials of Marketing Research Putting Research Into Practice 1st Edition C...
Modeling strategies for definitive screening designs using jmp and r
Near-memory & In-Memory Detection of Fileless Malware
Introduction to Behavioral Research Methods 7th Edition Leary Test Bank
GLMM in interventional study at Require 23, 20151219
Using Diversity for Automated Boundary Value Testing
MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...

Similar to Malware Variants Identification in Practice (20)

PPTX
Use of Definitive Screening Designs to Optimize an Analytical Method
PPTX
Analysis strategies for constrained mixture and mixture process experiments u...
PPTX
A software fault localization technique based on program mutations
PDF
How do we detect malware? A step-by-step guide
PDF
Introduction to Behavioral Research Methods 7th Edition Leary Test Bank
DOC
Đề thi mẫu 1(ISTQB)
DOC
Question ISTQB foundation 1
PPTX
General Factor Factorial Design
PPT
5.Black Box Testing and Levels of Testing.ppt
PDF
Introduction to Behavioral Research Methods 7th Edition Leary Test Bank
PPTX
Testing Assumptions in repeated Measures Design using SPSS
PDF
Six Sigma Methods and Formulas for Successful Quality Management
PDF
Introduction to Behavioral Research Methods 7th Edition Leary Test Bank
PDF
LNCS 5050 - Bilevel Optimization and Machine Learning
PPTX
Paper multi-modal biometric system using fingerprint , face and speech
PDF
German credit score shivaram prakash
PPTX
The Use of Static Code Analysis When Teaching or Developing Open-Source Software
PDF
Linking Evaluation and Cost-Benefit Analysis in Criminal Justice: A Practical...
PDF
Introduction to Behavioral Research Methods 7th Edition Leary Test Bank
PPT
Promise 2011: "Local Bias and its Impacts on the Performance of Parametric Es...
Use of Definitive Screening Designs to Optimize an Analytical Method
Analysis strategies for constrained mixture and mixture process experiments u...
A software fault localization technique based on program mutations
How do we detect malware? A step-by-step guide
Introduction to Behavioral Research Methods 7th Edition Leary Test Bank
Đề thi mẫu 1(ISTQB)
Question ISTQB foundation 1
General Factor Factorial Design
5.Black Box Testing and Levels of Testing.ppt
Introduction to Behavioral Research Methods 7th Edition Leary Test Bank
Testing Assumptions in repeated Measures Design using SPSS
Six Sigma Methods and Formulas for Successful Quality Management
Introduction to Behavioral Research Methods 7th Edition Leary Test Bank
LNCS 5050 - Bilevel Optimization and Machine Learning
Paper multi-modal biometric system using fingerprint , face and speech
German credit score shivaram prakash
The Use of Static Code Analysis When Teaching or Developing Open-Source Software
Linking Evaluation and Cost-Benefit Analysis in Criminal Justice: A Practical...
Introduction to Behavioral Research Methods 7th Edition Leary Test Bank
Promise 2011: "Local Bias and its Impacts on the Performance of Parametric Es...
Ad

More from Marcus Botacin (20)

PDF
Cross-Regional Malware Detection via Model Distilling and Federated Learning
PDF
What do malware analysts want from academia? A survey on the state-of-the-pra...
PDF
GPThreats: Fully-automated AI-generated malware and its security risks
PDF
[Texas A&M University] Research @ Botacin's Lab
PDF
Pilares da Segurança e Chaves criptográficas
PDF
Machine Learning by Examples - Marcus Botacin - TAMU 2024
PDF
Near-memory & In-Memory Detection of Fileless Malware
PDF
GPThreats-3: Is Automated Malware Generation a Threat?
PDF
[HackInTheBOx] All You Always Wanted to Know About Antiviruses
PDF
[Usenix Enigma\ Why Is Our Security Research Failing? Five Practices to Change!
PDF
Hardware-accelerated security monitoring
PDF
Among Viruses, Trojans, and Backdoors:Fighting Malware in 2022
PDF
Extraindo Caracterı́sticas de Arquivos Binários Executáveis
PDF
On the Malware Detection Problem: Challenges & Novel Approaches
PDF
All You Need to Know to Win a Cybersecurity Adversarial Machine Learning Comp...
PDF
Does Your Threat Model Consider Country and Culture? A Case Study of Brazilia...
PDF
Integridade, confidencialidade, disponibilidade, ransomware
PDF
An Empirical Study on the Blocking of HTTP and DNS Requests at Providers Leve...
PDF
On the Security of Application Installers & Online Software Repositories
PDF
Cross-Regional Malware Detection via Model Distilling and Federated Learning
What do malware analysts want from academia? A survey on the state-of-the-pra...
GPThreats: Fully-automated AI-generated malware and its security risks
[Texas A&M University] Research @ Botacin's Lab
Pilares da Segurança e Chaves criptográficas
Machine Learning by Examples - Marcus Botacin - TAMU 2024
Near-memory & In-Memory Detection of Fileless Malware
GPThreats-3: Is Automated Malware Generation a Threat?
[HackInTheBOx] All You Always Wanted to Know About Antiviruses
[Usenix Enigma\ Why Is Our Security Research Failing? Five Practices to Change!
Hardware-accelerated security monitoring
Among Viruses, Trojans, and Backdoors:Fighting Malware in 2022
Extraindo Caracterı́sticas de Arquivos Binários Executáveis
On the Malware Detection Problem: Challenges & Novel Approaches
All You Need to Know to Win a Cybersecurity Adversarial Machine Learning Comp...
Does Your Threat Model Consider Country and Culture? A Case Study of Brazilia...
Integridade, confidencialidade, disponibilidade, ransomware
An Empirical Study on the Blocking of HTTP and DNS Requests at Providers Leve...
On the Security of Application Installers & Online Software Repositories
Ad

Recently uploaded (20)

PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
CloudStack 4.21: First Look Webinar slides
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
STKI Israel Market Study 2025 version august
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PPTX
Modernising the Digital Integration Hub
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Getting Started with Data Integration: FME Form 101
PDF
DP Operators-handbook-extract for the Mautical Institute
PPT
Module 1.ppt Iot fundamentals and Architecture
PPTX
Chapter 5: Probability Theory and Statistics
PDF
August Patch Tuesday
PPT
Geologic Time for studying geology for geologist
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Unlock new opportunities with location data.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
The various Industrial Revolutions .pptx
Enhancing emotion recognition model for a student engagement use case through...
CloudStack 4.21: First Look Webinar slides
sustainability-14-14877-v2.pddhzftheheeeee
STKI Israel Market Study 2025 version august
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Modernising the Digital Integration Hub
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Taming the Chaos: How to Turn Unstructured Data into Decisions
Getting Started with Data Integration: FME Form 101
DP Operators-handbook-extract for the Mautical Institute
Module 1.ppt Iot fundamentals and Architecture
Chapter 5: Probability Theory and Statistics
August Patch Tuesday
Geologic Time for studying geology for geologist
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Assigned Numbers - 2025 - Bluetooth® Document
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Unlock new opportunities with location data.pdf
A comparative study of natural language inference in Swahili using monolingua...
The various Industrial Revolutions .pptx

Malware Variants Identification in Practice

  • 1. Motivation Challenges & Alternatives Experiments Concluding Remarks Malware Variants Identification in Practice Marcus Botacin1, Andr´e Gr´egio1, Paulo L´ıcio de Geus2 1Federal University of Paran´a (UFPR) – {mfbotacin, gregio}@inf.ufpr.br 2University of Campinas (Unicamp) – paulo@lasca.ic.unicamp.br SBSEG 2019 Malware Variants Identification in Practice 1 / 26 SBSeg’19
  • 2. Motivation Challenges & Alternatives Experiments Concluding Remarks Agenda 1 Motivation Motivation 2 Challenges & Alternatives Challenge 1 Challenge 2 3 Experiments Evaluation 4 Concluding Remarks Limitations Conclusions Malware Variants Identification in Practice 2 / 26 SBSeg’19
  • 3. Motivation Challenges & Alternatives Experiments Concluding Remarks Motivation Agenda 1 Motivation Motivation 2 Challenges & Alternatives Challenge 1 Challenge 2 3 Experiments Evaluation 4 Concluding Remarks Limitations Conclusions Malware Variants Identification in Practice 3 / 26 SBSeg’19
  • 4. Motivation Challenges & Alternatives Experiments Concluding Remarks Motivation Malware at Scale! Figure: Source: https://www.infosecurity-magazine.com/news/ 360k-new-malware-samples-every-day/ Figure: Source: https://money.cnn.com/2015/04/14/technology/ security/cyber-attack-hacks-security/index.html Malware Variants Identification in Practice 4 / 26 SBSeg’19
  • 5. Motivation Challenges & Alternatives Experiments Concluding Remarks Motivation Current Approaches. Figure: Function-based, Graph Modeling. Malware Variants Identification in Practice 5 / 26 SBSeg’19
  • 6. Motivation Challenges & Alternatives Experiments Concluding Remarks Challenge 1 Agenda 1 Motivation Motivation 2 Challenges & Alternatives Challenge 1 Challenge 2 3 Experiments Evaluation 4 Concluding Remarks Limitations Conclusions Malware Variants Identification in Practice 6 / 26 SBSeg’19
  • 7. Motivation Challenges & Alternatives Experiments Concluding Remarks Challenge 1 Same-Behavior Function Replacement CreateProcess CreateFile RegCreateKey NtSuspendThread Figure: Original sample’s CG. CreateThread CreateFileTransacted RegLoadKey RtlWow64SuspendThread Figure: Variant sample’s CG. Modularity File System Change Registry Change Application Interference Figure: Behavioral graph from both samples. Malware Variants Identification in Practice 7 / 26 SBSeg’19
  • 8. Motivation Challenges & Alternatives Experiments Concluding Remarks Challenge 1 Behavioral Classification Compression Cryptography. Debug. Delay. Environment. Escalation. Exfiltration. Fingerprint. File System. Interference. Internet. Modularity. Monitoring. Registry. Evidence Removal. Side Effects. System Changes. Target Information. Timing Attacks. Malware Variants Identification in Practice 8 / 26 SBSeg’19
  • 9. Motivation Challenges & Alternatives Experiments Concluding Remarks Challenge 1 Behavior-based Graph. Figure: Behavior-based graph for a given sample. Malware Variants Identification in Practice 9 / 26 SBSeg’19
  • 10. Motivation Challenges & Alternatives Experiments Concluding Remarks Challenge 2 Agenda 1 Motivation Motivation 2 Challenges & Alternatives Challenge 1 Challenge 2 3 Experiments Evaluation 4 Concluding Remarks Limitations Conclusions Malware Variants Identification in Practice 10 / 26 SBSeg’19
  • 11. Motivation Challenges & Alternatives Experiments Concluding Remarks Challenge 2 Malware Embedding Figure: Sample 1. The original sample. Figure: Sample 2. Variant sample embedding the original one. Malware Variants Identification in Practice 11 / 26 SBSeg’19
  • 12. Motivation Challenges & Alternatives Experiments Concluding Remarks Challenge 2 Matching Metrics Definition The similarity of two malware, represented as sets, A and B, of vertices or edges of two graphs, is defined as: Sim(A, B) = |A ∩ B| |A ∪ B| (1) Definition The similarity of two malware, represented as sets, A and B, of vertices or edges of two graphs, is defined as: Sim(A, B) = max |A ∩ B| |B| , |B ∩ A| |A| (2) Malware Variants Identification in Practice 12 / 26 SBSeg’19
  • 13. Motivation Challenges & Alternatives Experiments Concluding Remarks Challenge 2 Continence Results Table: Continence of Sample 1 in Sample 2. CG A B C I 0.66 0.52 0.64 J 0.75 0.49 0.50 K 0.42 0.80 0.44 Table: Continence of Sample 2 in Sample 1. CG A B C I 0.57 0.56 0.43 J 0.33 0.51 0.44 K 0.76 0.65 0.44 Table: Maximum continence of Sample 1 and Sample 2. CG A B C I 0.66 0.56 0.64 J 0.75 0.51 0.50 K 0.76 0.80 0.44 Malware Variants Identification in Practice 13 / 26 SBSeg’19
  • 14. Motivation Challenges & Alternatives Experiments Concluding Remarks Evaluation Agenda 1 Motivation Motivation 2 Challenges & Alternatives Challenge 1 Challenge 2 3 Experiments Evaluation 4 Concluding Remarks Limitations Conclusions Malware Variants Identification in Practice 14 / 26 SBSeg’19
  • 15. Motivation Challenges & Alternatives Experiments Concluding Remarks Evaluation The Whitelisting Effect. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Mimail.a x Mimail.c Mimail.c x Mimail.e Mimail.f x Mimail.g Evaluating the whitelist effect. Without whitelist With whitelist Figure: Evaluating whitelisting effect. Similarity scores are higher when using the whitelist-based approach. Malware Variants Identification in Practice 15 / 26 SBSeg’19
  • 16. Motivation Challenges & Alternatives Experiments Concluding Remarks Evaluation Advantages of the Behavioral model. 0 0.1 0.2 0.3 0.4 0.5 Mimail.a x Klez.a Mimail.c x Klez.a Mimail.k x Klez.g Comparing function−based to the behavior−based approaches. Function−based Behavior−based Figure: Function vs. Behavior-based approaches. Scores are higher when considering behavioral patterns. Malware Variants Identification in Practice 16 / 26 SBSeg’19
  • 17. Motivation Challenges & Alternatives Experiments Concluding Remarks Evaluation Evaluating Metrics. 0 0.2 0.4 0.6 0.8 1 Mimail.a x Mimail.c Mimail.c x Mimail.e Mimail.f x Mimail.g Evaluating the proposed metric. Usual metric Proposed metric Figure: Proposed metric. Scores are higher when using it in comparison to the usual one. Malware Variants Identification in Practice 17 / 26 SBSeg’19
  • 18. Motivation Challenges & Alternatives Experiments Concluding Remarks Evaluation Solutions Comparison. 0.0% 20.0% 40.0% 60.0% 80.0% 100.0% 120.0% H x Q I x Q J x Q L x Q M x Q Mimail’s similarity. Solution #1 Solution #2 Our solution Figure: Mimail’s sample similarity. Our solution’s scores are higher when compared to other ones. Malware Variants Identification in Practice 18 / 26 SBSeg’19
  • 19. Motivation Challenges & Alternatives Experiments Concluding Remarks Evaluation Domain Transformation and Similarity Measures. 0 10 20 30 40 50 60 70 80 90 100 50% 60% 70% 80% 90% Threshold evaluation F.Mimail F.Klez F.Cross B.Mimail B.Klez B.Cross Figure: Threshold evaluation. This should be higher than 80% in order to proper label the cross dataset. Malware Variants Identification in Practice 19 / 26 SBSeg’19
  • 20. Motivation Challenges & Alternatives Experiments Concluding Remarks Evaluation Real World Experiments (1/2) Table: Identified variants among unknown, wild-collected samples. Family Sample Hash Label 1 A c2ef1aabb15c979e932f5ea1d214cbeb Generic vb.OBY B 747b9fe5819a76529abc161bb449b8eb Generic vb.OBO C 39a04a11234d931bfa60d68ba8df9021 Generic vb.OBL 2 A 96d13246971e4368b9ed90c6f996a884 Atros4.CENI B e23588078ba6a5f5ca1c961a8336ec08 Atros4.CENI C 31a2b6adc781328cb1d77e5debb318ff Atros4.CENI Malware Variants Identification in Practice 20 / 26 SBSeg’19
  • 21. Motivation Challenges & Alternatives Experiments Concluding Remarks Evaluation Real World Experiments (2/2) 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% 100.0% 1.a x 1.b 1.a x 1.c 1.b x 1.c 2.a x 2.b 2.a x 2.c 2.b x 2.c Study case − Identified variants ssdeep Our solution Coverage Figure: Study case: variant identification. Our approach outperforms others even on low coverage scenarios. Malware Variants Identification in Practice 21 / 26 SBSeg’19
  • 22. Motivation Challenges & Alternatives Experiments Concluding Remarks Limitations Agenda 1 Motivation Motivation 2 Challenges & Alternatives Challenge 1 Challenge 2 3 Experiments Evaluation 4 Concluding Remarks Limitations Conclusions Malware Variants Identification in Practice 22 / 26 SBSeg’19
  • 23. Motivation Challenges & Alternatives Experiments Concluding Remarks Limitations Matching Complex Behaviors is Challenging! Figure: DLL injection functions among other function calls. Figure: Proposed DLL injection class. Malware Variants Identification in Practice 23 / 26 SBSeg’19
  • 24. Motivation Challenges & Alternatives Experiments Concluding Remarks Conclusions Agenda 1 Motivation Motivation 2 Challenges & Alternatives Challenge 1 Challenge 2 3 Experiments Evaluation 4 Concluding Remarks Limitations Conclusions Malware Variants Identification in Practice 24 / 26 SBSeg’19
  • 25. Motivation Challenges & Alternatives Experiments Concluding Remarks Conclusions Conclusions Challenges & Lessons Anti-disassembly breaks CG extraction. Transparent, dynamic tracing is a viable alternative. Same-Function Replacement breaks malware clustering. Behavior-based clustering is a viable alternative. Dead code breaks similarity metrics. Continence metric is a viable alternative. Malware Variants Identification in Practice 25 / 26 SBSeg’19
  • 26. Motivation Challenges & Alternatives Experiments Concluding Remarks Conclusions Questions & Comments. Contact mfbotacin@inf.ufpr.br Additional Material https://github.com/marcusbotacin/Malware.Variants Malware Variants Identification in Practice 26 / 26 SBSeg’19