SlideShare a Scribd company logo
Introduction Academic Contributions Moving Forward Conclusions
How do we detect malware?
A step-by-step guide
Marcus Botacin
1botacin@tamu.edu
marcusbotacin.github.io
How do we detect malware? 1 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Who Am I?
Assistant Professor (2022) - Texas A&M University (TAMU), USA
ACES Program Fellowship
PhD. in Computer Science (2021) - Federal University of Paraná (UFPR), Brazil
Thesis: “On the Malware Detection Problem: Challenges and new Approaches”
MSc. in Computer Science (2017) - University of Campinas (UNICAMP), Brazil
Dissertation: “Hardware-Assisted Malware Analysis”
Computer Engineer (2015) - University of Campinas (UNICAMP), Brazil
Final Project: “Malware detection via syscall patterns identification”
How do we detect malware? 2 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware
Topics
1 Introduction
Malware
Malware Detection
2 Academic Contributions
Examples
3 Moving Forward
Research Opportunities
4 Conclusions
Recap & Remarks
How do we detect malware? 3 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware
The Malware Problem
How do we detect malware? 4 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware
How have we been doing? (Overall)
The good side
Figure: https://www.paysafe.com/en/blo
g/do-consumers-trust-online-payments
-more-now-than-before-covid-19/
The bad side
Figure: https://www.ncr.com/blogs/paym
ents/credit-card-fraud-detection
How do we detect malware? 5 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware
How have we been doing? (Malware Specifics)
The good side
Figure:
https://apnews.com/article/europe-ma
lware-netherlands-coronavirus-pandem
ic-7de5f74120a968bd0a5bee3c57899fed
The bad side
Figure:
https://thehackernews.com/2021/06/dr
oidmorph-shows-popular-android.html
How do we detect malware? 6 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware Detection
Topics
1 Introduction
Malware
Malware Detection
2 Academic Contributions
Examples
3 Moving Forward
Research Opportunities
4 Conclusions
Recap & Remarks
How do we detect malware? 7 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware Detection
How Do We Detect Malware?
How do we detect malware? 8 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware Detection
The State-of-the-art in Malware Detection & Prevention
Steps
1 Collection
2 Triage
3 Sandbox Analysis
4 Threat Intelligence
5 Endpoint Protection
Distributed Processing
Collection
Cloud Processing
Analysis and Intelligence steps
Limited Processing
Endpoint
How do we detect malware? 9 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware Detection
Collection
How to find new malware samples?
Searching “dark web” forums.
Crawling software repositories.
Leveraging honeypots.
Checking spam traps.
Downloading Malware repositories.
Scrapping blocklists.
The result
Figure: https://www.forbes.com/sites/t
homasbrewster/2021/09/29/google-play
-warning-200-android-apps-stole-mi
llions-from-10-million-phones/
How do we detect malware? 10 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware Detection
Triage
Why how many new malware samples?
Variations from the same source
code.
Implications
Increase processing costs and
response time.
How to solve this problem?
Identify and cluster similar samples.
The Statistics
Figure:
https://www.kaspersky.com/about/pres
s-releases/2020 the-number-of-new-m
alicious-files-detected-every-day-
increases-by-52-to-360000-in-2020
How do we detect malware? 11 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware Detection
Sandbox Analysis
Goals
Uncover hidden
behaviors.
Method
Trace sample
execution.
Challenge
Handle evasion
attempts.
Solution 1
Figure: https://blog.vir
ustotal.com/2019/05/vi
rustotal-multisandbox-
yoroi-yomi.html
Solution 2
Figure: https:
//blog.virustotal.com/
2019/07/virustotal-mul
tisandbox-sndbox.html
How do we detect malware? 12 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware Detection
Threat Intelligence
Goal
Identify trends and predict attacks.
How?
Data analytics over analyzed
samples.
Challenges
Look to a representative dataset.
We should look to:
Figure: https://www.computerweekly.com
/news/252504676/Ransomware-attacks-i
ncrease-dramatically-during-2021
How do we detect malware? 13 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Malware Detection
Endpoint Protection
Goal
Protect customers in their machines.
How?
Moving the viable analyses to the
endpoint.
Challenges
Performance and usability
constraints.
Is there a “best”?
Figure: https://www.av-test.org/en/ant
ivirus/home-windows/
How do we detect malware? 14 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Topics
1 Introduction
Malware
Malware Detection
2 Academic Contributions
Examples
3 Moving Forward
Research Opportunities
4 Conclusions
Recap & Remarks
How do we detect malware? 15 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Enhancing Malware Triage
How do we detect malware? 16 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
The good side: Separating Code and Data
0 10 20 30 40 50 60 70 80 90 100
Similarity Score
0
10
20
30
40
50
60
70
80
90
100
Accuracy
(%)
AV Clustering Accuracy vs Similarity Score
All Text Data
Figure: Binary Sections Accuracy
0 10 20 30 40 50 60 70 80 90 100
Similarity Score
0
10
20
30
40
50
60
70
80
90
100
Recall
(%)
AV Clustering Recall vs Similarity Score
All Text Data
Figure: Binary Sections Recall
Source: https://www.sciencedirect.com/science/article/abs/pii/S26662
81721001281
How do we detect malware? 17 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
The bad side: Packed Samples
0 10 20 30 40 50 60 70 80 90 100
Similarity Score
0
10
20
30
40
50
60
70
80
90
100
Samples
(%)
The Impact of Packing on Sample's Similarity
Packed Unpacked Identical
Figure: The impact of UPX packing.
Packing reduces sample’s similarity scores.
UPX Packing
UPX Packing
Similar Not Similar
Not Similar
Not Similar
Similar
Unpacked 1 Packed 1
Packed 2
Unpacked 2
Figure: Average Packed Sample’s
Similarity Scheme. Cross-comparisons
should be avoided.
How do we detect malware? 18 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Enhancing Malware Tracing
How do we detect malware? 19 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Software-based Sandbox
Figure: System Architecture.
Link: https://link.springer.com/article/10.1007/s11416-017-0292-8
How do we detect malware? 20 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Drawbacks: Anti-VM
Technique Description Detection
VM Fingerprint
Check for known strings,
such as serial numbers
Check for known strings
inside the binary
CPUID Check Check CPU vendor
Check for known CPU
vendor strings
Invalid Opcodes
Launch hypervisor-specific
instructions
Check for specific instrutions
on the binary
System Table Checks Compare IDT values Look for checks involving IDT
HyperCall Detection Platform specific feature Look for specific instructions
How do we detect malware? 21 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Hardware-based Sandbox
Monitoring Steps
1 Software executes a branch.
2 Processor stores branch address in
memory page.
3 Processor raises an interrupt.
4 Kernel handles interrupt.
5 Kernel sends data to userland.
6 Userland introspects into this data.
Figure: System Architecture.
How do we detect malware? 22 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Key Insight: Branches define basic blocks
Figure: Identified branches and basic blocks..
Source: https://dl.acm.org/doi/10.
1145/3152162
Figure: CFG Reconstruction.
How do we detect malware? 23 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
From Tracing to Threat Intelligence
How do we detect malware? 24 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Brazilian Financial Malware on Desktop
Figure: Passive Banker Malware for
Santander bank waiting for user’s
credential input.
Figure: Passive Banker Malware for Itaú bank
waiting for user’s credential input.
Link: https://dl.acm.org/doi/10.1145/3429741
How do we detect malware? 25 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Brazilian Financial Malware on Mobile
Figure: BB’s Whatsapp chatbot. Figure: Bradesco’s Whatsapp chatbot.
Link: https://dl.acm.org/doi/10.1145/3339252.3340103
How do we detect malware? 26 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Brazilian Financial Malware Filetypes.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2012 2013 2014 2015 2016 2017 2018
Samples
(%)
Year
Evolution of threat’s filetype
PE
CPL
.NET
DLL
JAR
JS
VBE
Brazilian malware filetypes.
Varied file formats are prevalent
over the years.
How do we detect malware? 27 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
More about Brazilian Malware
Figure: Source:
https://www.usenix.org/conference/enigma2021/presentation/botacin
How do we detect malware? 28 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
From Threat Intelligence to Endpoint
Protection
How do we detect malware? 29 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Drawback: Real-time monitoring performance penalty
0
50
100
150
200
250
Perl Xalanc Gobmk H264 Namd Mcf
Time
(s)
Benchmark
AV’s Monitoring Performance
Filter AV SSDT AV No AV
Figure: AV Monitoring Performance.
0
50
100
150
200
250
300
perl namd Bzip milc mfc
Execution
Time
(s)
Benchmark
AV scanning overhead
Scan
Baseline
Figure: In-memory AV scans worst-case
and best-case performance penalties.
How do we detect malware? 30 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Hardware AV Architecture
2-level Architecture
Do not fully replace AVs, but add effi-
cient matching capabilities to them.
How do we detect malware? 31 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Performance Characterization
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
5 10 15 20 25 30 35 40
CPU
(%)
Time (s)
AV Monitoring Overhead
HEAVEN+AV
AV
No−AV
2-Phase HEAVEN CPU Performance
The inspection phase causes occasional,
and quick bursts of CPU usage. The AV
operating alone incurs a continuous 10%
performance overhead.
How do we detect malware? 32 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
A first idea: Hardware features as signatures
Figure: Two-level branch predictor. A
sequence window of taken (1) and not-taken
(0) branches is stored in the Global History
Register (GHR).
0
10
20
30
40
50
60
70
80
90
100
8 16 24 32 40
Percentage
of
signature
collision
in
the
k−bit
space
Branch pattern length (in k bits)
Percentage of signature collision per branch−pattern length (in bits)
Patterns
Figure: Branch patterns coverage.
How do we detect malware? 33 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Examples
Result: Performance penalty reduction
1×108
1×109
1×1010
1×1011
1×10
12
1×10
13
1×10
14
blender nab roms bwaves djeng perl cam4 cactusomnetpp mcf wrf x264 xzr leela parest lbm namd imagick povray xalanc gcc echg2
Cycles
(logscale)
Benchmark
AV’s Performance Overhead
AVSW
AVHW
BASE
Figure: Performance evaluation when tracking all function calls. Comparison between
execution without AV (BASE), execution with software AV, and execution with the proposed
coprocessor model.
How do we detect malware? 34 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
Topics
1 Introduction
Malware
Malware Detection
2 Academic Contributions
Examples
3 Moving Forward
Research Opportunities
4 Conclusions
Recap & Remarks
How do we detect malware? 35 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
Deep Learning:
From Images to Binaries
How do we detect malware? 36 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
Malware Binaries as Textures
Figure: Source: https://link.springer.com/chapter/10.1007/978-3-030-30215-3 19
How do we detect malware? 37 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
Adversarial Machine Learning
Detection Bypasses
How do we detect malware? 38 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
Adversarial Machine Learning
Figure: Source: https://github.com/marcusbotacin/Talks/tree/master/Waikato
How do we detect malware? 39 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
Adversarial Malware
Figure: Dropper Strategy. Figure: Data Appendix Result.
How do we detect malware? 40 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
ML Evasion Contest
Figure: mlsec.io
Figure: https://cujo.com/machine-learn
ing-security-evasion-competition-202
0-results-and-behind-the-scenes/
How do we detect malware? 41 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
Transition to Practice:
Analysis Platforms
How do we detect malware? 42 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Research Opportunities
A Current Public Malware Analysis Platform
Figure: https://app.any.run
How do we detect malware? 43 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Recap & Remarks
Topics
1 Introduction
Malware
Malware Detection
2 Academic Contributions
Examples
3 Moving Forward
Research Opportunities
4 Conclusions
Recap & Remarks
How do we detect malware? 44 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Recap & Remarks
Summary
Malware Detection
No definitive solution, but a pipeline of attempts.
World is better with some approximation of security.
Academic Contributions
Better Triage with Similarity Hashing
Better Analyses with new Sandboxes
Better Threat Intelligence for Brazilian Malware.
Better endpoint protection with Hardware AVs
Moving Forward
Open research positions. Get in touch!
How do we detect malware? 45 / 46 TAMU
Introduction Academic Contributions Moving Forward Conclusions
Recap & Remarks
Thanks!
Questions? Comments?
@MarcusBotacin
botacin@tamu.edu
marcusbotacin.github.io
How do we detect malware? 46 / 46 TAMU

More Related Content

PDF
Among Viruses, Trojans, and Backdoors:Fighting Malware in 2022
PDF
On the Malware Detection Problem: Challenges & Novel Approaches
PPTX
spamzombieppt
PDF
MACHINE LEARNING APPLICATIONS IN MALWARE CLASSIFICATION: A METAANALYSIS LITER...
PDF
130531 francis nahm - on the evolution of antipatterns genealogies
PPTX
Keith J. Jones, Ph.D. - MALGAZER: AN AUTOMATED MALWARE CLASSIFIER WITH RUNNIN...
PDF
IRJET - Survey on Malware Detection using Deep Learning Methods
PDF
What do malware analysts want from academia? A survey on the state-of-the-pra...
Among Viruses, Trojans, and Backdoors:Fighting Malware in 2022
On the Malware Detection Problem: Challenges & Novel Approaches
spamzombieppt
MACHINE LEARNING APPLICATIONS IN MALWARE CLASSIFICATION: A METAANALYSIS LITER...
130531 francis nahm - on the evolution of antipatterns genealogies
Keith J. Jones, Ph.D. - MALGAZER: AN AUTOMATED MALWARE CLASSIFIER WITH RUNNIN...
IRJET - Survey on Malware Detection using Deep Learning Methods
What do malware analysts want from academia? A survey on the state-of-the-pra...

Similar to How do we detect malware? A step-by-step guide (20)

PDF
[Webinar] The Art & Value of Bug Bounty Programs
PDF
Tech Report: On the Effectiveness of Malware Protection on Android
PPT
Security Application for Malicious Code Detection using Data Mining
DOCX
Spam email filtering
ODP
Why Do Computational Scientists Trust Their So
PDF
Applications of genetic algorithms to malware detection and creation
PPTX
Vulnerability Management Nirvana - Seattle Agora - 18Mar16
PDF
Near-memory & In-Memory Detection of Fileless Malware
PPTX
OWASP Barcelona 2025 Threat Model Library
PPT
Fuzzing101 uvm-reporting-and-mitigation-2011-02-10
PDF
Application of data mining based malicious code detection techniques for dete...
PDF
Machine Learning in Malware Detection
DOCX
Running Head 2Week #8 MidTerm Assignment .docx
PDF
COMPARATIVE REVIEW OF MALWARE ANALYSIS METHODOLOGIES
PDF
TRISC 2010 - Grapevine , Texas
PDF
Scaling Web 2.0 Malware Infection
PDF
Malware analysis and detection using reverse Engineering, Available at: www....
PPT
Software Security in the Real World
PDF
THE METHOD OF DETECTING ONLINE PASSWORD ATTACKS BASED ON HIGH-LEVEL PROTOCOL ...
[Webinar] The Art & Value of Bug Bounty Programs
Tech Report: On the Effectiveness of Malware Protection on Android
Security Application for Malicious Code Detection using Data Mining
Spam email filtering
Why Do Computational Scientists Trust Their So
Applications of genetic algorithms to malware detection and creation
Vulnerability Management Nirvana - Seattle Agora - 18Mar16
Near-memory & In-Memory Detection of Fileless Malware
OWASP Barcelona 2025 Threat Model Library
Fuzzing101 uvm-reporting-and-mitigation-2011-02-10
Application of data mining based malicious code detection techniques for dete...
Machine Learning in Malware Detection
Running Head 2Week #8 MidTerm Assignment .docx
COMPARATIVE REVIEW OF MALWARE ANALYSIS METHODOLOGIES
TRISC 2010 - Grapevine , Texas
Scaling Web 2.0 Malware Infection
Malware analysis and detection using reverse Engineering, Available at: www....
Software Security in the Real World
THE METHOD OF DETECTING ONLINE PASSWORD ATTACKS BASED ON HIGH-LEVEL PROTOCOL ...
Ad

More from Marcus Botacin (20)

PDF
Cross-Regional Malware Detection via Model Distilling and Federated Learning
PDF
GPThreats: Fully-automated AI-generated malware and its security risks
PDF
[Texas A&M University] Research @ Botacin's Lab
PDF
Pilares da Segurança e Chaves criptográficas
PDF
Machine Learning by Examples - Marcus Botacin - TAMU 2024
PDF
Near-memory & In-Memory Detection of Fileless Malware
PDF
GPThreats-3: Is Automated Malware Generation a Threat?
PDF
[HackInTheBOx] All You Always Wanted to Know About Antiviruses
PDF
[Usenix Enigma\ Why Is Our Security Research Failing? Five Practices to Change!
PDF
Hardware-accelerated security monitoring
PDF
Extraindo Caracterı́sticas de Arquivos Binários Executáveis
PDF
All You Need to Know to Win a Cybersecurity Adversarial Machine Learning Comp...
PDF
Does Your Threat Model Consider Country and Culture? A Case Study of Brazilia...
PDF
Integridade, confidencialidade, disponibilidade, ransomware
PDF
An Empirical Study on the Blocking of HTTP and DNS Requests at Providers Leve...
PDF
On the Security of Application Installers & Online Software Repositories
PDF
PDF
The Internet Banking [in]Security Spiral: Past, Present, and Future of Online...
PDF
Análise do Malware Ativo na Internet Brasileira: 4 anos depois. O que mudou?
PDF
Towards Malware Decompilation and Reassembly
Cross-Regional Malware Detection via Model Distilling and Federated Learning
GPThreats: Fully-automated AI-generated malware and its security risks
[Texas A&M University] Research @ Botacin's Lab
Pilares da Segurança e Chaves criptográficas
Machine Learning by Examples - Marcus Botacin - TAMU 2024
Near-memory & In-Memory Detection of Fileless Malware
GPThreats-3: Is Automated Malware Generation a Threat?
[HackInTheBOx] All You Always Wanted to Know About Antiviruses
[Usenix Enigma\ Why Is Our Security Research Failing? Five Practices to Change!
Hardware-accelerated security monitoring
Extraindo Caracterı́sticas de Arquivos Binários Executáveis
All You Need to Know to Win a Cybersecurity Adversarial Machine Learning Comp...
Does Your Threat Model Consider Country and Culture? A Case Study of Brazilia...
Integridade, confidencialidade, disponibilidade, ransomware
An Empirical Study on the Blocking of HTTP and DNS Requests at Providers Leve...
On the Security of Application Installers & Online Software Repositories
The Internet Banking [in]Security Spiral: Past, Present, and Future of Online...
Análise do Malware Ativo na Internet Brasileira: 4 anos depois. O que mudou?
Towards Malware Decompilation and Reassembly
Ad

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Electronic commerce courselecture one. Pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Big Data Technologies - Introduction.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
cuic standard and advanced reporting.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation theory and applications.pdf
The AUB Centre for AI in Media Proposal.docx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Programs and apps: productivity, graphics, security and other tools
Building Integrated photovoltaic BIPV_UPV.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
sap open course for s4hana steps from ECC to s4
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Electronic commerce courselecture one. Pdf
A Presentation on Artificial Intelligence
Network Security Unit 5.pdf for BCA BBA.
The Rise and Fall of 3GPP – Time for a Sabbatical?
Big Data Technologies - Introduction.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Machine learning based COVID-19 study performance prediction
cuic standard and advanced reporting.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf

How do we detect malware? A step-by-step guide

  • 1. Introduction Academic Contributions Moving Forward Conclusions How do we detect malware? A step-by-step guide Marcus Botacin 1botacin@tamu.edu marcusbotacin.github.io How do we detect malware? 1 / 46 TAMU
  • 2. Introduction Academic Contributions Moving Forward Conclusions Who Am I? Assistant Professor (2022) - Texas A&M University (TAMU), USA ACES Program Fellowship PhD. in Computer Science (2021) - Federal University of Paraná (UFPR), Brazil Thesis: “On the Malware Detection Problem: Challenges and new Approaches” MSc. in Computer Science (2017) - University of Campinas (UNICAMP), Brazil Dissertation: “Hardware-Assisted Malware Analysis” Computer Engineer (2015) - University of Campinas (UNICAMP), Brazil Final Project: “Malware detection via syscall patterns identification” How do we detect malware? 2 / 46 TAMU
  • 3. Introduction Academic Contributions Moving Forward Conclusions Malware Topics 1 Introduction Malware Malware Detection 2 Academic Contributions Examples 3 Moving Forward Research Opportunities 4 Conclusions Recap & Remarks How do we detect malware? 3 / 46 TAMU
  • 4. Introduction Academic Contributions Moving Forward Conclusions Malware The Malware Problem How do we detect malware? 4 / 46 TAMU
  • 5. Introduction Academic Contributions Moving Forward Conclusions Malware How have we been doing? (Overall) The good side Figure: https://www.paysafe.com/en/blo g/do-consumers-trust-online-payments -more-now-than-before-covid-19/ The bad side Figure: https://www.ncr.com/blogs/paym ents/credit-card-fraud-detection How do we detect malware? 5 / 46 TAMU
  • 6. Introduction Academic Contributions Moving Forward Conclusions Malware How have we been doing? (Malware Specifics) The good side Figure: https://apnews.com/article/europe-ma lware-netherlands-coronavirus-pandem ic-7de5f74120a968bd0a5bee3c57899fed The bad side Figure: https://thehackernews.com/2021/06/dr oidmorph-shows-popular-android.html How do we detect malware? 6 / 46 TAMU
  • 7. Introduction Academic Contributions Moving Forward Conclusions Malware Detection Topics 1 Introduction Malware Malware Detection 2 Academic Contributions Examples 3 Moving Forward Research Opportunities 4 Conclusions Recap & Remarks How do we detect malware? 7 / 46 TAMU
  • 8. Introduction Academic Contributions Moving Forward Conclusions Malware Detection How Do We Detect Malware? How do we detect malware? 8 / 46 TAMU
  • 9. Introduction Academic Contributions Moving Forward Conclusions Malware Detection The State-of-the-art in Malware Detection & Prevention Steps 1 Collection 2 Triage 3 Sandbox Analysis 4 Threat Intelligence 5 Endpoint Protection Distributed Processing Collection Cloud Processing Analysis and Intelligence steps Limited Processing Endpoint How do we detect malware? 9 / 46 TAMU
  • 10. Introduction Academic Contributions Moving Forward Conclusions Malware Detection Collection How to find new malware samples? Searching “dark web” forums. Crawling software repositories. Leveraging honeypots. Checking spam traps. Downloading Malware repositories. Scrapping blocklists. The result Figure: https://www.forbes.com/sites/t homasbrewster/2021/09/29/google-play -warning-200-android-apps-stole-mi llions-from-10-million-phones/ How do we detect malware? 10 / 46 TAMU
  • 11. Introduction Academic Contributions Moving Forward Conclusions Malware Detection Triage Why how many new malware samples? Variations from the same source code. Implications Increase processing costs and response time. How to solve this problem? Identify and cluster similar samples. The Statistics Figure: https://www.kaspersky.com/about/pres s-releases/2020 the-number-of-new-m alicious-files-detected-every-day- increases-by-52-to-360000-in-2020 How do we detect malware? 11 / 46 TAMU
  • 12. Introduction Academic Contributions Moving Forward Conclusions Malware Detection Sandbox Analysis Goals Uncover hidden behaviors. Method Trace sample execution. Challenge Handle evasion attempts. Solution 1 Figure: https://blog.vir ustotal.com/2019/05/vi rustotal-multisandbox- yoroi-yomi.html Solution 2 Figure: https: //blog.virustotal.com/ 2019/07/virustotal-mul tisandbox-sndbox.html How do we detect malware? 12 / 46 TAMU
  • 13. Introduction Academic Contributions Moving Forward Conclusions Malware Detection Threat Intelligence Goal Identify trends and predict attacks. How? Data analytics over analyzed samples. Challenges Look to a representative dataset. We should look to: Figure: https://www.computerweekly.com /news/252504676/Ransomware-attacks-i ncrease-dramatically-during-2021 How do we detect malware? 13 / 46 TAMU
  • 14. Introduction Academic Contributions Moving Forward Conclusions Malware Detection Endpoint Protection Goal Protect customers in their machines. How? Moving the viable analyses to the endpoint. Challenges Performance and usability constraints. Is there a “best”? Figure: https://www.av-test.org/en/ant ivirus/home-windows/ How do we detect malware? 14 / 46 TAMU
  • 15. Introduction Academic Contributions Moving Forward Conclusions Examples Topics 1 Introduction Malware Malware Detection 2 Academic Contributions Examples 3 Moving Forward Research Opportunities 4 Conclusions Recap & Remarks How do we detect malware? 15 / 46 TAMU
  • 16. Introduction Academic Contributions Moving Forward Conclusions Examples Enhancing Malware Triage How do we detect malware? 16 / 46 TAMU
  • 17. Introduction Academic Contributions Moving Forward Conclusions Examples The good side: Separating Code and Data 0 10 20 30 40 50 60 70 80 90 100 Similarity Score 0 10 20 30 40 50 60 70 80 90 100 Accuracy (%) AV Clustering Accuracy vs Similarity Score All Text Data Figure: Binary Sections Accuracy 0 10 20 30 40 50 60 70 80 90 100 Similarity Score 0 10 20 30 40 50 60 70 80 90 100 Recall (%) AV Clustering Recall vs Similarity Score All Text Data Figure: Binary Sections Recall Source: https://www.sciencedirect.com/science/article/abs/pii/S26662 81721001281 How do we detect malware? 17 / 46 TAMU
  • 18. Introduction Academic Contributions Moving Forward Conclusions Examples The bad side: Packed Samples 0 10 20 30 40 50 60 70 80 90 100 Similarity Score 0 10 20 30 40 50 60 70 80 90 100 Samples (%) The Impact of Packing on Sample's Similarity Packed Unpacked Identical Figure: The impact of UPX packing. Packing reduces sample’s similarity scores. UPX Packing UPX Packing Similar Not Similar Not Similar Not Similar Similar Unpacked 1 Packed 1 Packed 2 Unpacked 2 Figure: Average Packed Sample’s Similarity Scheme. Cross-comparisons should be avoided. How do we detect malware? 18 / 46 TAMU
  • 19. Introduction Academic Contributions Moving Forward Conclusions Examples Enhancing Malware Tracing How do we detect malware? 19 / 46 TAMU
  • 20. Introduction Academic Contributions Moving Forward Conclusions Examples Software-based Sandbox Figure: System Architecture. Link: https://link.springer.com/article/10.1007/s11416-017-0292-8 How do we detect malware? 20 / 46 TAMU
  • 21. Introduction Academic Contributions Moving Forward Conclusions Examples Drawbacks: Anti-VM Technique Description Detection VM Fingerprint Check for known strings, such as serial numbers Check for known strings inside the binary CPUID Check Check CPU vendor Check for known CPU vendor strings Invalid Opcodes Launch hypervisor-specific instructions Check for specific instrutions on the binary System Table Checks Compare IDT values Look for checks involving IDT HyperCall Detection Platform specific feature Look for specific instructions How do we detect malware? 21 / 46 TAMU
  • 22. Introduction Academic Contributions Moving Forward Conclusions Examples Hardware-based Sandbox Monitoring Steps 1 Software executes a branch. 2 Processor stores branch address in memory page. 3 Processor raises an interrupt. 4 Kernel handles interrupt. 5 Kernel sends data to userland. 6 Userland introspects into this data. Figure: System Architecture. How do we detect malware? 22 / 46 TAMU
  • 23. Introduction Academic Contributions Moving Forward Conclusions Examples Key Insight: Branches define basic blocks Figure: Identified branches and basic blocks.. Source: https://dl.acm.org/doi/10. 1145/3152162 Figure: CFG Reconstruction. How do we detect malware? 23 / 46 TAMU
  • 24. Introduction Academic Contributions Moving Forward Conclusions Examples From Tracing to Threat Intelligence How do we detect malware? 24 / 46 TAMU
  • 25. Introduction Academic Contributions Moving Forward Conclusions Examples Brazilian Financial Malware on Desktop Figure: Passive Banker Malware for Santander bank waiting for user’s credential input. Figure: Passive Banker Malware for Itaú bank waiting for user’s credential input. Link: https://dl.acm.org/doi/10.1145/3429741 How do we detect malware? 25 / 46 TAMU
  • 26. Introduction Academic Contributions Moving Forward Conclusions Examples Brazilian Financial Malware on Mobile Figure: BB’s Whatsapp chatbot. Figure: Bradesco’s Whatsapp chatbot. Link: https://dl.acm.org/doi/10.1145/3339252.3340103 How do we detect malware? 26 / 46 TAMU
  • 27. Introduction Academic Contributions Moving Forward Conclusions Examples Brazilian Financial Malware Filetypes. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 2012 2013 2014 2015 2016 2017 2018 Samples (%) Year Evolution of threat’s filetype PE CPL .NET DLL JAR JS VBE Brazilian malware filetypes. Varied file formats are prevalent over the years. How do we detect malware? 27 / 46 TAMU
  • 28. Introduction Academic Contributions Moving Forward Conclusions Examples More about Brazilian Malware Figure: Source: https://www.usenix.org/conference/enigma2021/presentation/botacin How do we detect malware? 28 / 46 TAMU
  • 29. Introduction Academic Contributions Moving Forward Conclusions Examples From Threat Intelligence to Endpoint Protection How do we detect malware? 29 / 46 TAMU
  • 30. Introduction Academic Contributions Moving Forward Conclusions Examples Drawback: Real-time monitoring performance penalty 0 50 100 150 200 250 Perl Xalanc Gobmk H264 Namd Mcf Time (s) Benchmark AV’s Monitoring Performance Filter AV SSDT AV No AV Figure: AV Monitoring Performance. 0 50 100 150 200 250 300 perl namd Bzip milc mfc Execution Time (s) Benchmark AV scanning overhead Scan Baseline Figure: In-memory AV scans worst-case and best-case performance penalties. How do we detect malware? 30 / 46 TAMU
  • 31. Introduction Academic Contributions Moving Forward Conclusions Examples Hardware AV Architecture 2-level Architecture Do not fully replace AVs, but add effi- cient matching capabilities to them. How do we detect malware? 31 / 46 TAMU
  • 32. Introduction Academic Contributions Moving Forward Conclusions Examples Performance Characterization 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 5 10 15 20 25 30 35 40 CPU (%) Time (s) AV Monitoring Overhead HEAVEN+AV AV No−AV 2-Phase HEAVEN CPU Performance The inspection phase causes occasional, and quick bursts of CPU usage. The AV operating alone incurs a continuous 10% performance overhead. How do we detect malware? 32 / 46 TAMU
  • 33. Introduction Academic Contributions Moving Forward Conclusions Examples A first idea: Hardware features as signatures Figure: Two-level branch predictor. A sequence window of taken (1) and not-taken (0) branches is stored in the Global History Register (GHR). 0 10 20 30 40 50 60 70 80 90 100 8 16 24 32 40 Percentage of signature collision in the k−bit space Branch pattern length (in k bits) Percentage of signature collision per branch−pattern length (in bits) Patterns Figure: Branch patterns coverage. How do we detect malware? 33 / 46 TAMU
  • 34. Introduction Academic Contributions Moving Forward Conclusions Examples Result: Performance penalty reduction 1×108 1×109 1×1010 1×1011 1×10 12 1×10 13 1×10 14 blender nab roms bwaves djeng perl cam4 cactusomnetpp mcf wrf x264 xzr leela parest lbm namd imagick povray xalanc gcc echg2 Cycles (logscale) Benchmark AV’s Performance Overhead AVSW AVHW BASE Figure: Performance evaluation when tracking all function calls. Comparison between execution without AV (BASE), execution with software AV, and execution with the proposed coprocessor model. How do we detect malware? 34 / 46 TAMU
  • 35. Introduction Academic Contributions Moving Forward Conclusions Research Opportunities Topics 1 Introduction Malware Malware Detection 2 Academic Contributions Examples 3 Moving Forward Research Opportunities 4 Conclusions Recap & Remarks How do we detect malware? 35 / 46 TAMU
  • 36. Introduction Academic Contributions Moving Forward Conclusions Research Opportunities Deep Learning: From Images to Binaries How do we detect malware? 36 / 46 TAMU
  • 37. Introduction Academic Contributions Moving Forward Conclusions Research Opportunities Malware Binaries as Textures Figure: Source: https://link.springer.com/chapter/10.1007/978-3-030-30215-3 19 How do we detect malware? 37 / 46 TAMU
  • 38. Introduction Academic Contributions Moving Forward Conclusions Research Opportunities Adversarial Machine Learning Detection Bypasses How do we detect malware? 38 / 46 TAMU
  • 39. Introduction Academic Contributions Moving Forward Conclusions Research Opportunities Adversarial Machine Learning Figure: Source: https://github.com/marcusbotacin/Talks/tree/master/Waikato How do we detect malware? 39 / 46 TAMU
  • 40. Introduction Academic Contributions Moving Forward Conclusions Research Opportunities Adversarial Malware Figure: Dropper Strategy. Figure: Data Appendix Result. How do we detect malware? 40 / 46 TAMU
  • 41. Introduction Academic Contributions Moving Forward Conclusions Research Opportunities ML Evasion Contest Figure: mlsec.io Figure: https://cujo.com/machine-learn ing-security-evasion-competition-202 0-results-and-behind-the-scenes/ How do we detect malware? 41 / 46 TAMU
  • 42. Introduction Academic Contributions Moving Forward Conclusions Research Opportunities Transition to Practice: Analysis Platforms How do we detect malware? 42 / 46 TAMU
  • 43. Introduction Academic Contributions Moving Forward Conclusions Research Opportunities A Current Public Malware Analysis Platform Figure: https://app.any.run How do we detect malware? 43 / 46 TAMU
  • 44. Introduction Academic Contributions Moving Forward Conclusions Recap & Remarks Topics 1 Introduction Malware Malware Detection 2 Academic Contributions Examples 3 Moving Forward Research Opportunities 4 Conclusions Recap & Remarks How do we detect malware? 44 / 46 TAMU
  • 45. Introduction Academic Contributions Moving Forward Conclusions Recap & Remarks Summary Malware Detection No definitive solution, but a pipeline of attempts. World is better with some approximation of security. Academic Contributions Better Triage with Similarity Hashing Better Analyses with new Sandboxes Better Threat Intelligence for Brazilian Malware. Better endpoint protection with Hardware AVs Moving Forward Open research positions. Get in touch! How do we detect malware? 45 / 46 TAMU
  • 46. Introduction Academic Contributions Moving Forward Conclusions Recap & Remarks Thanks! Questions? Comments? @MarcusBotacin botacin@tamu.edu marcusbotacin.github.io How do we detect malware? 46 / 46 TAMU