SlideShare a Scribd company logo
MaMaDroid: Detecting Android
Malware by Building Markov Chains of
Behavioral Models
Android & Malware
Android market share is growing…
In 2016, 85% of smartphone sales
…and so is the interest of cybercriminals
Bypassing two-factor authentication
Stealing sensitive information, etc.
2
Current Defenses
Can’t use complex on-device operations
Limited battery and memory resources
Google’s centralized analysis
Not perfect, after-the-fact
Many apps installed outside Play Store
Lots of research in the field! However…
Permission-based models prone to false positive
Relying on API calls frequently used by malware needs
constant, costly retraining
3
Our Idea
Rely on the sequence of abstracted calls
1. Sequence captures the behavioral model
2. Abstraction provides resilience to API changes
Intuition: malware uses calls for different actions
and in different order than benign apps
E.g. android.media.MediaRecorder used by any app
with permission to record audio
Only using it after calls to getRunningTasks(), which
allows to record conversations, may suggest
maliciousness
4
Overview
5
Call Graph Extraction
Based on static analysis
Given an apk, extract call graphs
Tools
Soot (Java optimization and analysis framework)
FlowDroid (ensures contexts & flows preserved)
6
7
Call Graph
8
Overview
9
Sequence Extraction
Soot gives the sequence of functions that are
potentially called by the program, but…
Each execution could take a specific branch of the
graph and only execute a subset of the calls
When running example multiple times…
Execute() may be followed by different calls, e.g.,
getShell() only in try or getShell() + getMessage() in
catch
10
Sequence Extraction (cnt’d)
We proceed as follows…
1. Identify set of entry nodes
2. Enumerate reachable paths
3. Output set of all paths as the sequences of API calls
11
Abstraction
Packages
Using the list of 243 packages (as of API level 24) + 95
from the Google API
Packages defined by developers à “self-defined”
If we can’t tell what its class implements à “obfuscated”
Families
9 families: android, google, java, javax, xml, apache,
junit, json, dom
Plus self-defined and obfuscated
12
Example
13
Overview
14
Markov Chain
Memoryless models
Prob. transitioning from a state to another only depends on
the current state
Represented as a set of nodes
Each corresponding to a different state, and a set of edges
labeled with the probability of transition.
Sum of all probabilities associated to all edges
from any node is exactly 1
15
Markov-chain based modeling
Building the Markov Chains
From the sequence of abstracted API calls, each
package/family is a state, transition is the probability
of moving from one to another
16
Feature Extraction
For each app:
Feature vector = probabilities of transitioning from one
state to another in the Markov chain
With families, 11 possible states à 121 possible
transitions in each chain
With packages, 340 states à 115,600 transitions
Principal Component Analysis (PCA)
Standard way to reduce/refine features
17
Overview
18
Classification
Build a classifier using the extracted features
Each app labeled as benign or malware
Can use a few standard algorithms for this task…
Random Forests
1-NN, 3-NN
SVM
Maybe deep learning?
19
Datasets
20
How many API calls?
21
Android/Google family calls?
22
Evaluation
(1) Accuracy of classification on benign and malicious
samples developed around the same time
(2) Robustness to the evolution of malware as well as
of the Android framework (using older datasets for
training and newer ones for testing and vice-versa)
23
Same Year
24
family
package
Training on older samples
25
family
package
Training on newer samples
26
family
package
MaMaDroid vs DroidAPIMiner
27
Case Studies (2016/newbenign)
False Positives (164 samples)
Most of them “dangerous permissions”
E.g., SMS permissions not clear why requested
False Negatives (114 samples)
Actually not classified as malware by VirusTotal, might
actually be legitimate
Most of them adware
28
Evasion
Repackaging benign apps
Difficult to embed malicious code while keeping similar
Markov chain, viceversa is also hard
Imitating Markov chains
Likely ineffective
Obfuscation/Mangling
Still captured by the [obfuscated] abstraction
More in the paper…
29
Limitations
Classification is memory hungry
Soot is buggy, we lose ~4% of the samples
Limits of static analysis only methods
30
Future Work
Further investigate resilience to evasion
Focus on repackaged malicious apps
Injection of API calls to mess with Markov chains
Enhancements
Fine-grained abstractions (e.g., class)
Seed with dynamic analysis
31
Thank you!
32
Paper to appear at NDSS 2017:
E. Mariconti, L. Onwuzurike, P. Andriotis,
E. De Cristofaro, G. Ross, G. Stringhini.
MaMaDroid: Detecting Android Malware by
Building Markov Chains of Behavioral Model
Thank you!
33

More Related Content

PDF
Malware Detection - A Machine Learning Perspective
PDF
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...
DOCX
robust malware detection for iot devices using deep eigen space learning
PPTX
Semantics aware malware detection ppt
PDF
Adversarial machine learning for av software
PPT
The Future of Automated Malware Generation
PPTX
The Value of Multi-scanning
PPT
Data mining techniques for malware detection.pptx
Malware Detection - A Machine Learning Perspective
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...
robust malware detection for iot devices using deep eigen space learning
Semantics aware malware detection ppt
Adversarial machine learning for av software
The Future of Automated Malware Generation
The Value of Multi-scanning
Data mining techniques for malware detection.pptx

Similar to MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models (20)

PDF
Android Malware Detection Literature Review
PDF
Android malware classification with API call-grams
PDF
Permission based Android Malware Detection using Random Forest
PDF
Android malware presentation
PDF
IRJET- Android Malware Detection using Machine Learning
PDF
IRJET- Android Malware Detection System
PDF
MACHINE LEARNING APPROACH TO LEARN AND DETECT MALWARE IN ANDROID
PDF
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
PDF
Fast detection of Android malware: machine learning approach
PPTX
MALWARE DETECTION A FRAMEWORK FOR REVERSE ENGINEERED ANDROID APPLICATIONS_.pptx
PDF
Android Malware Detection
PPTX
Predict Android ransomware using categorical classifiaction.pptx
PDF
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
PDF
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
PPTX
I haz you and pwn your maal
PPTX
Android security
PPTX
[IITB BTP 2015 Dec] Dynamic detection of malware in android OS.pptx
PDF
Building Custom Android Malware BruCON 2013
PDF
Effective classification of android malware families through dynamic features...
Android Malware Detection Literature Review
Android malware classification with API call-grams
Permission based Android Malware Detection using Random Forest
Android malware presentation
IRJET- Android Malware Detection using Machine Learning
IRJET- Android Malware Detection System
MACHINE LEARNING APPROACH TO LEARN AND DETECT MALWARE IN ANDROID
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Fast detection of Android malware: machine learning approach
MALWARE DETECTION A FRAMEWORK FOR REVERSE ENGINEERED ANDROID APPLICATIONS_.pptx
Android Malware Detection
Predict Android ransomware using categorical classifiaction.pptx
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
I haz you and pwn your maal
Android security
[IITB BTP 2015 Dec] Dynamic detection of malware in android OS.pptx
Building Custom Android Malware BruCON 2013
Effective classification of android malware families through dynamic features...
Ad

More from Emiliano De Cristofaro (9)

PPTX
The Genomics Revolution: The Good, The Bad, and The Ugly (Confessions of a Pr...
PPTX
Building and Measuring Privacy-Preserving Mobility Analytics
PDF
A Measurement Study of 4chan’s Politically Incorrect Forum and Its Effects on...
PPTX
The Genomics Revolution: The Good, The Bad, The Ugly
PPTX
Understanding, Characterizing, and Detecting Facebook Like Farms
PPTX
The Genomics Revolution: The Good, The Bad, and The Ugly (UEOP16 Keynote)
PPTX
Privacy-preserving Information Sharing: Tools and Applications
PPTX
The Genomics Revolution: The Good, The Bad, and The Ugly
PPTX
The Chills and Thrills of Whole Genome Sequencing
The Genomics Revolution: The Good, The Bad, and The Ugly (Confessions of a Pr...
Building and Measuring Privacy-Preserving Mobility Analytics
A Measurement Study of 4chan’s Politically Incorrect Forum and Its Effects on...
The Genomics Revolution: The Good, The Bad, The Ugly
Understanding, Characterizing, and Detecting Facebook Like Farms
The Genomics Revolution: The Good, The Bad, and The Ugly (UEOP16 Keynote)
Privacy-preserving Information Sharing: Tools and Applications
The Genomics Revolution: The Good, The Bad, and The Ugly
The Chills and Thrills of Whole Genome Sequencing
Ad

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Big Data Technologies - Introduction.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation theory and applications.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Empathic Computing: Creating Shared Understanding
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Cloud computing and distributed systems.
cuic standard and advanced reporting.pdf
The AUB Centre for AI in Media Proposal.docx
Digital-Transformation-Roadmap-for-Companies.pptx
Approach and Philosophy of On baking technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Review of recent advances in non-invasive hemoglobin estimation
Big Data Technologies - Introduction.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation theory and applications.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Encapsulation_ Review paper, used for researhc scholars
Empathic Computing: Creating Shared Understanding
Building Integrated photovoltaic BIPV_UPV.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Cloud computing and distributed systems.

MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models

  • 1. MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models
  • 2. Android & Malware Android market share is growing… In 2016, 85% of smartphone sales …and so is the interest of cybercriminals Bypassing two-factor authentication Stealing sensitive information, etc. 2
  • 3. Current Defenses Can’t use complex on-device operations Limited battery and memory resources Google’s centralized analysis Not perfect, after-the-fact Many apps installed outside Play Store Lots of research in the field! However… Permission-based models prone to false positive Relying on API calls frequently used by malware needs constant, costly retraining 3
  • 4. Our Idea Rely on the sequence of abstracted calls 1. Sequence captures the behavioral model 2. Abstraction provides resilience to API changes Intuition: malware uses calls for different actions and in different order than benign apps E.g. android.media.MediaRecorder used by any app with permission to record audio Only using it after calls to getRunningTasks(), which allows to record conversations, may suggest maliciousness 4
  • 6. Call Graph Extraction Based on static analysis Given an apk, extract call graphs Tools Soot (Java optimization and analysis framework) FlowDroid (ensures contexts & flows preserved) 6
  • 7. 7
  • 10. Sequence Extraction Soot gives the sequence of functions that are potentially called by the program, but… Each execution could take a specific branch of the graph and only execute a subset of the calls When running example multiple times… Execute() may be followed by different calls, e.g., getShell() only in try or getShell() + getMessage() in catch 10
  • 11. Sequence Extraction (cnt’d) We proceed as follows… 1. Identify set of entry nodes 2. Enumerate reachable paths 3. Output set of all paths as the sequences of API calls 11
  • 12. Abstraction Packages Using the list of 243 packages (as of API level 24) + 95 from the Google API Packages defined by developers à “self-defined” If we can’t tell what its class implements à “obfuscated” Families 9 families: android, google, java, javax, xml, apache, junit, json, dom Plus self-defined and obfuscated 12
  • 15. Markov Chain Memoryless models Prob. transitioning from a state to another only depends on the current state Represented as a set of nodes Each corresponding to a different state, and a set of edges labeled with the probability of transition. Sum of all probabilities associated to all edges from any node is exactly 1 15
  • 16. Markov-chain based modeling Building the Markov Chains From the sequence of abstracted API calls, each package/family is a state, transition is the probability of moving from one to another 16
  • 17. Feature Extraction For each app: Feature vector = probabilities of transitioning from one state to another in the Markov chain With families, 11 possible states à 121 possible transitions in each chain With packages, 340 states à 115,600 transitions Principal Component Analysis (PCA) Standard way to reduce/refine features 17
  • 19. Classification Build a classifier using the extracted features Each app labeled as benign or malware Can use a few standard algorithms for this task… Random Forests 1-NN, 3-NN SVM Maybe deep learning? 19
  • 21. How many API calls? 21
  • 23. Evaluation (1) Accuracy of classification on benign and malicious samples developed around the same time (2) Robustness to the evolution of malware as well as of the Android framework (using older datasets for training and newer ones for testing and vice-versa) 23
  • 25. Training on older samples 25 family package
  • 26. Training on newer samples 26 family package
  • 28. Case Studies (2016/newbenign) False Positives (164 samples) Most of them “dangerous permissions” E.g., SMS permissions not clear why requested False Negatives (114 samples) Actually not classified as malware by VirusTotal, might actually be legitimate Most of them adware 28
  • 29. Evasion Repackaging benign apps Difficult to embed malicious code while keeping similar Markov chain, viceversa is also hard Imitating Markov chains Likely ineffective Obfuscation/Mangling Still captured by the [obfuscated] abstraction More in the paper… 29
  • 30. Limitations Classification is memory hungry Soot is buggy, we lose ~4% of the samples Limits of static analysis only methods 30
  • 31. Future Work Further investigate resilience to evasion Focus on repackaged malicious apps Injection of API calls to mess with Markov chains Enhancements Fine-grained abstractions (e.g., class) Seed with dynamic analysis 31
  • 32. Thank you! 32 Paper to appear at NDSS 2017: E. Mariconti, L. Onwuzurike, P. Andriotis, E. De Cristofaro, G. Ross, G. Stringhini. MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Model