SlideShare a Scribd company logo
REDNAGA
ANDROID MALWARE
AND MACHINE LEARNING
CALEB FENTON
08.24.2017
Dead Drop SF
WHO AM I
• Researcher @ SentinelOne
• Previously @ Lookout and @SourceClear
• Enjoy reading, cryptocurrency, economics
• Made Simplify and other Android tools
• @caleb_fenton
• github.com/CalebFenton
CALEB
WHO ARE WE
• rednaga.io
• Banded together by the love of 0days and hot sauces
• Collaborate and try to improve the community
• Disclosures / Code / Lessons on GitHub
• @RedNagaSec
• github.com/RedNaga
RED NAGA
TALK OVERVIEW
1. Machine learning overview
2. Using apkfile for feature extraction
3. Useful features for Android malware
4. Tips for building good models
REDNAGA
MACHINE LEARNING
OVERVIEW
STEP 1: UNDERSTAND THE FORMAT
• Android apps come as APK files
• APKs are just ZIPs
• APKs are rich with variety
• Android manifest / binary XML
• Dalvik executables
• Signing certificates
• Other resources (icons, maps, sounds, …)
• Offensive & Defensive Android Reverse Engineering

github.com/rednaga/training/tree/master/DEFCON23
STEP 2: COLLECT SAMPLES
• Need lots of good and bad samples
• Diversity of good and bad is important
• Sample sources:
• VirusTotal, VirusShare, market crawlers, other
researchers, friends
STEP 3: ENGINEER FEATURES
How do humans do it?
STEP 3: ENGINEER FEATURES
features = [has_beard]
STEP 3: ENGINEER FEATURES
App label: MX Player Pro
Package: com.mxtech.videoplayer.pro
CN=Kim Jae hyun, O=MX Technologies, L=Seoul, ST=South Korea, C=KR
App label: Google Service Updater
Package: it.googleandroid.updater
CN=GService inc, OU=G Service inc, O=G, L=New York, ST=New York, C=US
Example 1
Example 2
STEP 3: ENGINEER FEATURES
• Certificate details - common name, country, …
• Suspicious strings - “pm uninstall”, “google”
• Permissions - which ones and how many
• API calls - send SMS, load DEX file
• Overall app quality - default icons, typos
STEP 4: BUILD AND TUNE MODELS
• Collect and prepare data
• Drop low value features
• Try many algorithms
• Train and blend multiple models
REVIEW
1. Collect samples
2. Understand the format
3. Engineer features (apkfile!)
4. Build and tune model
REDNAGA
USING APKFILE
WHAT IS APKFILE?
• APK feature extraction library (Java)
• github.com/CalebFenton/apkfile
• Parses DEX files (dexlib2)
• Parses APK certificates
• Parses Android manifest (based on ArscBlamer)
• Hardened for use against obfuscation
• Everything is an object for easy inspection
EXAMPLE: ANDROID MANIFEST
ApkFile apkFile = new ApkFile("someapp.apk");
AndroidManifest androidManifest = apkFile.getAndroidManifest();
// Get some manifest properties
String packageName = androidManifest.getPackageName();
String appLabel = androidManifest.getApplication().getLabel();
// Print permission names
for (Permission permission : androidManifest.getPermissions()) {
System.out.println("permission: " + permission.getName());
}
// Print exported services
for (Service service : androidManifest.getApplication().getServices()) {
if (service.isExported()) {
System.out.println("exported: " + service.getName());
}
}
EXAMPLE: APK CERTIFICATE
ApkFile apkFile = new ApkFile("example-malware.apk");
Certificate certificate = apkFile.getCertificate();
Collection<Certificate.SubjectAndIssuerRdns> allRdns =
certificate.getAllRdns();
// APK may be signed by multiple certificates
for (Certificate.SubjectAndIssuerRdns rdns : allRdns) {
Map<String, String> subjectRdns = rdns.getSubjectRdns();
// Get certificate subject CN and O properties
System.out.println("Subject common name: " + subjectRdns.get("CN"));
System.out.println("Subject organization: " + subjectRdns.get("O"));
// Print all certificate properties
System.out.println("Issuer RDNS: " + rdns.getIssuerRdns());
}
EXAMPLE: DALVIK EXECUTABLES
Map<String, DexFile> pathToDexFile = apkFile.getDexFiles();
for (Map.Entry<String, DexFile> e : pathToDexFile.entrySet()) {
String path = e.getKey();
DexFile dexFile = e.getValue();
System.out.println("Analyzing " + path);
dexFile.analyze();
// Average cyclomatic complexity, also available for each method
System.out.println("Cyclomatic complexity: " + dexFile.getCyclomaticComplexity());
// Get API call counts over all methods
// Trove maps generally preferred for unboxing, incrementing performance
TObjectIntIterator<MethodReference> iterator = dexFile.getApiCounts().iterator();
while (iterator.hasNext()) {
iterator.advance();
MethodReference methodRef = iterator.key();
int count = iterator.value();
// E.g. Ljava/lang/StringBuilder;->toString called 18 times
System.out.println(methodRef + " called " + count + " times");
}
// Print op code histograms for each method
for (Map.Entry<String, DexMethod> me : dexFile.getMethodDescriptorToMethod().entrySet()) {
String methodDescriptor = me.getKey();
// E.g. Lit/googleandroid/updater/a;->a(Ljava/lang/String;)Ljava/lang/String; op counts
System.out.println(methodDescriptor + " op counts");
DexMethod dexMethod = me.getValue();
TObjectIntIterator<Opcode> opIter = dexMethod.getOpCounts().iterator();
while (opIter.hasNext()) {
opIter.advance();
// E.g. MOVE_RESULT_OBJECT: 46
System.out.println(" " + opIter.key() + ": " + opIter.value());
}
}
}
REDNAGA
USEFUL FEATURES
ANDROID MANIFEST
• Has main launcher activity
• No launcher implies no user interaction
• Number of activity package paths
• Malicious activities injected?
• Permissions / number of permissions
• Good clue what app may do
APKID FEATURES
• “PEiD for Android” - detects compilers, packers, …
• Compiler - dx (native) / dexlib (modified)
• Anti-VM strings - avoiding VM analysis
• Build.MANUFACTURER, SIM operator, device ID, subscriber ID
• Detecting Pirated and Malicious Android Apps with APKiD

rednaga.io/2016/07/31/detecting_pirated_and_malicious_android_apps_with_apkid/
STRINGS
• Number of gibberish strings
• Find weird certificate details
• Find unusual obfuscation
•
Using Markov Chains for Android Malware Detection

calebfenton.github.io/2017/08/23/using-markov-chains-for-android-malware-detection/
REDNAGA
TIPS FOR BUILDING
GOOD MODELS
TIPS
• Most guides are for toy data sets
• No one talks about large data set problems
• Everyone assumes you have a dense matrix
• Assuming sklearn, but applies to other libs
PREPARING DATA
• Normalization is important
• Scale with MaxAbs or MinMax if many 0s
• Needed for some algorithms (not decision trees)
• Needed for dropping invariant features
• Drop invariant features
• Reduces chance of overfitting
• Example: file hash, app label, rare API calls
SELECTING FEATURES
• Score features and plot scores to build intuition
• Usually long tail of useless features
• Gives ideas for new features
• Top 100 features almost as good as top 1000
• Run experiments with subsets of features
• Improves speed
• Only interested in relative differences
BUILDING MODELS
• Grid search to find best algorithms and parameters
• Iterate on several, smaller searches
• Decision tree ensembles aren’t hip, but work well

sentinelone.com/blog/detecting-malware-pre-execution-static-analysis-machine-learning/
• Build and blend multiple models

sentinelone.com/blog/measuring-the-usefulness-of-multiple-models/
• Feature Selection and Grid Searching Hyper-parameters

gist.github.com/CalebFenton/66aa04af7b4a4d98efca059cb8c2e7aa
REDNAGA
EXTENDED READING
https://github.com/rednaga/training/tree/master/DEFCON23
http://blog.datadive.net/selecting-good-features-part-i-univariate-selection/
https://rednaga.io/
https://calebfenton.github.io/
http://androidcracking.blogspot.com/
REDNAGA
08.24.2017
THANKS!
Dead Drop SF
CALEB FENTON
@CALEB_FENTON
QUESTIONS?

More Related Content

PDF
Android Deobfuscation: Tools and Techniques
PDF
ProbeDroid - Crafting Your Own Dynamic Instrument Tool on Android for App Beh...
PPTX
Coding Standard And Code Review
PDF
Toward dynamic analysis of obfuscated android malware
PDF
(CISC 2013) Real-Time Record and Replay on Android for Malware Analysis
PDF
(COSCUP 2015) A Beginner's Journey to Mozilla SpiderMonkey JS Engine
PDF
Improving DroidBox
PPTX
Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...
Android Deobfuscation: Tools and Techniques
ProbeDroid - Crafting Your Own Dynamic Instrument Tool on Android for App Beh...
Coding Standard And Code Review
Toward dynamic analysis of obfuscated android malware
(CISC 2013) Real-Time Record and Replay on Android for Malware Analysis
(COSCUP 2015) A Beginner's Journey to Mozilla SpiderMonkey JS Engine
Improving DroidBox
Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...

What's hot (20)

PDF
ScalaClean at ScalaSphere 2019
PDF
A Static Type Analyzer of Untyped Ruby Code for Ruby 3
PPTX
What I Learned From Writing a Test Framework (And Why I May Never Write One A...
PDF
Dependence day insurgence
PDF
CNIT 126 13: Data Encoding
PDF
Fallacies of unit testing
PDF
Variables in Pharo5
PDF
Building Scalable Applications with Laravel
PDF
Reflection in Pharo: Beyond Smalltak
PDF
The Python in the Apple
PDF
Reflection in Pharo: Beyond Smalltak
PDF
Systematic Evaluation of the Unsoundness of Call Graph Algorithms for Java
PDF
Robot Framework Introduction & Sauce Labs Integration
PPTX
Android lint presentation
PPTX
Building Large Scale PHP Web Applications with Laravel 4
PDF
Dynamically Composing Collection Operations through Collection Promises
PPTX
Sonarjenkins ajip
PDF
Why the Dark Side should use Swift and a SOLID Architecture
PPTX
Tech Days 2015: CodePeer - Introduction and Examples of Use
PDF
Practical Malware Analysis: Ch 15: Anti-Disassembly
ScalaClean at ScalaSphere 2019
A Static Type Analyzer of Untyped Ruby Code for Ruby 3
What I Learned From Writing a Test Framework (And Why I May Never Write One A...
Dependence day insurgence
CNIT 126 13: Data Encoding
Fallacies of unit testing
Variables in Pharo5
Building Scalable Applications with Laravel
Reflection in Pharo: Beyond Smalltak
The Python in the Apple
Reflection in Pharo: Beyond Smalltak
Systematic Evaluation of the Unsoundness of Call Graph Algorithms for Java
Robot Framework Introduction & Sauce Labs Integration
Android lint presentation
Building Large Scale PHP Web Applications with Laravel 4
Dynamically Composing Collection Operations through Collection Promises
Sonarjenkins ajip
Why the Dark Side should use Swift and a SOLID Architecture
Tech Days 2015: CodePeer - Introduction and Examples of Use
Practical Malware Analysis: Ch 15: Anti-Disassembly
Ad

Similar to Android Malware and Machine Learning (20)

PDF
IRJET- Android Malware Detection using Deep Learning
PDF
Fast detection of Android malware: machine learning approach
PDF
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...
PDF
Malware Analysis
PDF
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
PDF
MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral ...
PDF
The art of android hacking by Abhinav Mishra (0ctac0der)
PDF
The art of android hacking
PDF
Permission based Android Malware Detection using Random Forest
PDF
Pwning mobile apps without root or jailbreak
PPTX
Rapid Android Application Security Testing
PDF
Permission Driven Malware Detection using Machine Learning
PDF
Hacking your Android (slides)
PPTX
Security Vulnerabilities in Mobile Applications (Kristaps Felzenbergs)
PPTX
Analysis of android apk using adhrit by Abhishek J.M
PDF
hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...
PDF
AppSec PNW: Android and iOS Application Security with MobSF
PDF
IRJET- Android Malware Detection using Machine Learning
PPTX
Android application analyzer
PPTX
Generative Testing in Clojure
IRJET- Android Malware Detection using Deep Learning
Fast detection of Android malware: machine learning approach
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...
Malware Analysis
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral ...
The art of android hacking by Abhinav Mishra (0ctac0der)
The art of android hacking
Permission based Android Malware Detection using Random Forest
Pwning mobile apps without root or jailbreak
Rapid Android Application Security Testing
Permission Driven Malware Detection using Machine Learning
Hacking your Android (slides)
Security Vulnerabilities in Mobile Applications (Kristaps Felzenbergs)
Analysis of android apk using adhrit by Abhishek J.M
hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...
AppSec PNW: Android and iOS Application Security with MobSF
IRJET- Android Malware Detection using Machine Learning
Android application analyzer
Generative Testing in Clojure
Ad

Recently uploaded (20)

PDF
System and Network Administraation Chapter 3
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
L1 - Introduction to python Backend.pptx
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Digital Strategies for Manufacturing Companies
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
history of c programming in notes for students .pptx
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
System and Network Administration Chapter 2
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Online Work Permit System for Fast Permit Processing
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Transform Your Business with a Software ERP System
PDF
medical staffing services at VALiNTRY
PPTX
Introduction to Artificial Intelligence
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
System and Network Administraation Chapter 3
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
L1 - Introduction to python Backend.pptx
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Internet Downloader Manager (IDM) Crack 6.42 Build 41
How Creative Agencies Leverage Project Management Software.pdf
Digital Strategies for Manufacturing Companies
CHAPTER 2 - PM Management and IT Context
history of c programming in notes for students .pptx
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
System and Network Administration Chapter 2
Understanding Forklifts - TECH EHS Solution
Online Work Permit System for Fast Permit Processing
2025 Textile ERP Trends: SAP, Odoo & Oracle
Transform Your Business with a Software ERP System
medical staffing services at VALiNTRY
Introduction to Artificial Intelligence
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...

Android Malware and Machine Learning

  • 1. REDNAGA ANDROID MALWARE AND MACHINE LEARNING CALEB FENTON 08.24.2017 Dead Drop SF
  • 2. WHO AM I • Researcher @ SentinelOne • Previously @ Lookout and @SourceClear • Enjoy reading, cryptocurrency, economics • Made Simplify and other Android tools • @caleb_fenton • github.com/CalebFenton CALEB
  • 3. WHO ARE WE • rednaga.io • Banded together by the love of 0days and hot sauces • Collaborate and try to improve the community • Disclosures / Code / Lessons on GitHub • @RedNagaSec • github.com/RedNaga RED NAGA
  • 4. TALK OVERVIEW 1. Machine learning overview 2. Using apkfile for feature extraction 3. Useful features for Android malware 4. Tips for building good models
  • 6. STEP 1: UNDERSTAND THE FORMAT • Android apps come as APK files • APKs are just ZIPs • APKs are rich with variety • Android manifest / binary XML • Dalvik executables • Signing certificates • Other resources (icons, maps, sounds, …) • Offensive & Defensive Android Reverse Engineering
 github.com/rednaga/training/tree/master/DEFCON23
  • 7. STEP 2: COLLECT SAMPLES • Need lots of good and bad samples • Diversity of good and bad is important • Sample sources: • VirusTotal, VirusShare, market crawlers, other researchers, friends
  • 8. STEP 3: ENGINEER FEATURES How do humans do it?
  • 9. STEP 3: ENGINEER FEATURES features = [has_beard]
  • 10. STEP 3: ENGINEER FEATURES App label: MX Player Pro Package: com.mxtech.videoplayer.pro CN=Kim Jae hyun, O=MX Technologies, L=Seoul, ST=South Korea, C=KR App label: Google Service Updater Package: it.googleandroid.updater CN=GService inc, OU=G Service inc, O=G, L=New York, ST=New York, C=US Example 1 Example 2
  • 11. STEP 3: ENGINEER FEATURES • Certificate details - common name, country, … • Suspicious strings - “pm uninstall”, “google” • Permissions - which ones and how many • API calls - send SMS, load DEX file • Overall app quality - default icons, typos
  • 12. STEP 4: BUILD AND TUNE MODELS • Collect and prepare data • Drop low value features • Try many algorithms • Train and blend multiple models
  • 13. REVIEW 1. Collect samples 2. Understand the format 3. Engineer features (apkfile!) 4. Build and tune model
  • 15. WHAT IS APKFILE? • APK feature extraction library (Java) • github.com/CalebFenton/apkfile • Parses DEX files (dexlib2) • Parses APK certificates • Parses Android manifest (based on ArscBlamer) • Hardened for use against obfuscation • Everything is an object for easy inspection
  • 16. EXAMPLE: ANDROID MANIFEST ApkFile apkFile = new ApkFile("someapp.apk"); AndroidManifest androidManifest = apkFile.getAndroidManifest(); // Get some manifest properties String packageName = androidManifest.getPackageName(); String appLabel = androidManifest.getApplication().getLabel(); // Print permission names for (Permission permission : androidManifest.getPermissions()) { System.out.println("permission: " + permission.getName()); } // Print exported services for (Service service : androidManifest.getApplication().getServices()) { if (service.isExported()) { System.out.println("exported: " + service.getName()); } }
  • 17. EXAMPLE: APK CERTIFICATE ApkFile apkFile = new ApkFile("example-malware.apk"); Certificate certificate = apkFile.getCertificate(); Collection<Certificate.SubjectAndIssuerRdns> allRdns = certificate.getAllRdns(); // APK may be signed by multiple certificates for (Certificate.SubjectAndIssuerRdns rdns : allRdns) { Map<String, String> subjectRdns = rdns.getSubjectRdns(); // Get certificate subject CN and O properties System.out.println("Subject common name: " + subjectRdns.get("CN")); System.out.println("Subject organization: " + subjectRdns.get("O")); // Print all certificate properties System.out.println("Issuer RDNS: " + rdns.getIssuerRdns()); }
  • 18. EXAMPLE: DALVIK EXECUTABLES Map<String, DexFile> pathToDexFile = apkFile.getDexFiles(); for (Map.Entry<String, DexFile> e : pathToDexFile.entrySet()) { String path = e.getKey(); DexFile dexFile = e.getValue(); System.out.println("Analyzing " + path); dexFile.analyze(); // Average cyclomatic complexity, also available for each method System.out.println("Cyclomatic complexity: " + dexFile.getCyclomaticComplexity()); // Get API call counts over all methods // Trove maps generally preferred for unboxing, incrementing performance TObjectIntIterator<MethodReference> iterator = dexFile.getApiCounts().iterator(); while (iterator.hasNext()) { iterator.advance(); MethodReference methodRef = iterator.key(); int count = iterator.value(); // E.g. Ljava/lang/StringBuilder;->toString called 18 times System.out.println(methodRef + " called " + count + " times"); } // Print op code histograms for each method for (Map.Entry<String, DexMethod> me : dexFile.getMethodDescriptorToMethod().entrySet()) { String methodDescriptor = me.getKey(); // E.g. Lit/googleandroid/updater/a;->a(Ljava/lang/String;)Ljava/lang/String; op counts System.out.println(methodDescriptor + " op counts"); DexMethod dexMethod = me.getValue(); TObjectIntIterator<Opcode> opIter = dexMethod.getOpCounts().iterator(); while (opIter.hasNext()) { opIter.advance(); // E.g. MOVE_RESULT_OBJECT: 46 System.out.println(" " + opIter.key() + ": " + opIter.value()); } } }
  • 20. ANDROID MANIFEST • Has main launcher activity • No launcher implies no user interaction • Number of activity package paths • Malicious activities injected? • Permissions / number of permissions • Good clue what app may do
  • 21. APKID FEATURES • “PEiD for Android” - detects compilers, packers, … • Compiler - dx (native) / dexlib (modified) • Anti-VM strings - avoiding VM analysis • Build.MANUFACTURER, SIM operator, device ID, subscriber ID • Detecting Pirated and Malicious Android Apps with APKiD
 rednaga.io/2016/07/31/detecting_pirated_and_malicious_android_apps_with_apkid/
  • 22. STRINGS • Number of gibberish strings • Find weird certificate details • Find unusual obfuscation • Using Markov Chains for Android Malware Detection
 calebfenton.github.io/2017/08/23/using-markov-chains-for-android-malware-detection/
  • 24. TIPS • Most guides are for toy data sets • No one talks about large data set problems • Everyone assumes you have a dense matrix • Assuming sklearn, but applies to other libs
  • 25. PREPARING DATA • Normalization is important • Scale with MaxAbs or MinMax if many 0s • Needed for some algorithms (not decision trees) • Needed for dropping invariant features • Drop invariant features • Reduces chance of overfitting • Example: file hash, app label, rare API calls
  • 26. SELECTING FEATURES • Score features and plot scores to build intuition • Usually long tail of useless features • Gives ideas for new features • Top 100 features almost as good as top 1000 • Run experiments with subsets of features • Improves speed • Only interested in relative differences
  • 27. BUILDING MODELS • Grid search to find best algorithms and parameters • Iterate on several, smaller searches • Decision tree ensembles aren’t hip, but work well
 sentinelone.com/blog/detecting-malware-pre-execution-static-analysis-machine-learning/ • Build and blend multiple models
 sentinelone.com/blog/measuring-the-usefulness-of-multiple-models/ • Feature Selection and Grid Searching Hyper-parameters
 gist.github.com/CalebFenton/66aa04af7b4a4d98efca059cb8c2e7aa
  • 29. REDNAGA 08.24.2017 THANKS! Dead Drop SF CALEB FENTON @CALEB_FENTON QUESTIONS?