SlideShare a Scribd company logo
Mining apps for anomalies
P e r s p e c t i v e s o n D a t a S c i e n c e
f o r S o f t w a r e E n g i n e e r i n g
Agenda
2
• Specifications
• APP MINING
• DETECTING ABNORMAL BEHAVIOR
• CHABADA
• TREASURE OF DATA
• OBSTACLES
Specifications
• Does the program do what it is
supposed to do?
• Will it continue to do so in the future?
• How to define what its supposed to do?
3
Formal Methods
Flappy Bird
• Your aim is to move a little bird up and down
such that it does not hit an obstacle.
• As a developer you list undesired properties (no
crash, no spying).
• How to specify gameplay to computer?
• Can we teach a computer how to check a
program against expectations?
• Learn what program behavior is normal in a
given context?
4
APP MINING
5
App mining leverages common knowledge in thousands of
apps to automatically learn what is “normal” behavior—
and in contrast, automatically identify “abnormal” behavior.
APP MINING
• Leverage the knowledge encoded into the hundreds
of thousands of apps available in app stores
• Determine what would be normal behavior, to
detect what would be abnormal behavior
• Guide programmers and users toward better security
and usability
A p p s i n a p p s t o r e s h a v e t h r e e f e a t u r e s
1. Apps come with all sorts of metadata, such as names, categories,
and user interfaces. All of these can be associated with program
features, so you can, for instance, associate program behavior with
descriptions.
2. Apps are pretty much uniform. They use the same libraries, which
on top, use fairly recent designs. All this makes apps easy to analyze,
execute, and test—and consequently, easy to compare.
3. Apps are redundant. There are plenty of apps that all address
similar problems. This is in sharp contrast to open source programs..
This redundancy in apps allows us to learn common patterns of how
problems are addressed—and, in return, detect anomalies.
6
DETECTING ABNORMAL BEHAVIOR
The problem with “normal” behavior is that it varies according to the
app’s purpose.:
• If an app sends out text messages, that would normally be a sign of
malicious behavior—unless it is a messaging application, where
sending text messages is one of the advertised features.
• If an app continuously monitors your position, this might be
malicious behavior—unless it is a tracking app that again advertises
this as a feature.
• Simply checking for a set of predefined “undesired” features is not
enough—if the features are clearly advertised, then it is reasonable
to assume the user tolerates, or even wants these features, because
otherwise, she would not have chosen the app.
7
8
Introducing CHABADA
• To determine what is normal, we thus must assess program behavior together with its description. If the
behavior is advertised then it’s fine; if not, it may come as a surprise to the user, and thus should be flagged.
• This is the idea we followed in our first app mining work, the CHABADA tool.
• A general tool to detect mismatches between the behavior of an app and its description
• Applied on a set of 22,500 apps, CHABADA can detect 74% of novel malware, with a false positive rate
below 10%.
• Our recent MUDFLOW prototype, which learns normal data flows from apps, can even detect more than
90% of novel malware leaking sensitive data.
“Checking App Behavior Against Descriptions of Apps”
CHABADA
• CHABADA starts with a (large) set of apps to be analyzed.
• It first applies tried-and-proven natural language
processing techniques (stemming, LDA (Latent Dirichlet
Analysis), topic analysis) to abstract the app descriptions
into topics.
• It builds clusters of those apps whose topics have the
most in common. Thus, all apps whose descriptions refer
to messaging end up in a “Messaging” cluster.
9
10
CHABADA
• Within each cluster, CHABADA will now search for outliers
regarding app behavior.
• Simply use the set of API calls contained in each app; these
are easy to extract using simple static analysis tools.
• CHABADA uses tried-and-proven outlier analysis techniques,
which provide a ranking of the apps in a cluster, depending
on how far away their API usage is from the norm. Those
apps that are ranked highest are the most likely outliers.
11
A TREASURE OF DATA …
1. Future techniques will tie program analysis to user interface analysis.
2. Mining user interaction may reveal behavior patterns we could reuse in various contexts.
3. Violating behavior patterns may also imply usability issues. If a button named “Login” does nothing, for
instance, it would be very different from the other “Login” buttons used in other apps—and hopefully be
flagged as an anomaly.
4. Given good test generators, one can systematically explore the dynamic behavior, and gain information on
concrete text and resources accessed
a n u mb er of id eas th at ap p stores all make p ossib le
OBSTACLES
1. Getting apps is not hard, but not easy either. Besides the official stores, there is no publicly available repository
of apps where you could simply download thousands of apps, because violation of copyright.
2. For apps, there’s no easily accessible source code, version, or bug information. If you monitor a store for a
sufficient time, you may be able to access and compare releases, but that’s it. Vendors not going to help you and
open source is limited . Fortunately, app byte code is not too hard to get through.
3. Metadata is only a very weak indicator of program quality. Lots of one-star reviews may refer to a recent price
increase or political reasons; but reviews talking about crashes or malicious behavior might give clear signs.
4. Never underestimate developers. Vendors typically have a pretty clear picture of what their users do, If you think
you can mine metadata to predict release dates, reviews, or sentiments: talk to vendors first and check your
proposal against the realities of app development.
Any Questions?
Thank You.

More Related Content

PPTX
Droidcon mobile security
ODP
Basic Keyword Research
PPTX
malware
PDF
RuReal App Is Introduced As A User Friendly App to Stalk Images On Web
PDF
Build your mobile app from a to z presentation
ODP
Mobile Keywords
PPTX
APIs - The Pretty Face of Your Microservice
PDF
Behind the scene of malware operators. Insights and countermeasures. CONFiden...
Droidcon mobile security
Basic Keyword Research
malware
RuReal App Is Introduced As A User Friendly App to Stalk Images On Web
Build your mobile app from a to z presentation
Mobile Keywords
APIs - The Pretty Face of Your Microservice
Behind the scene of malware operators. Insights and countermeasures. CONFiden...

Similar to Mining apps for anomalies (20)

PDF
Why Mobile App Penetration Testing Matters.pdf
DOCX
App Store Optimization Tips 101
DOCX
Detecting malicious facebook applications
DOCX
Detecting malicious facebook applications
DOCX
DETECTING MALICIOUS FACEBOOK APPLICATIONS - IEEE PROJECTS IN PONDICHERRY,BUL...
DOCX
Detecting malicious facebook applicationsi
DOCX
Android Malware Detection Using Genetic Algorithm.docx
PDF
Hidden Speed Bumps on the Road to "Continuous"
DOCX
Malware Detection using ML Malware Detection using ml
PPTX
Exploratory Analysis On Play Store Apps.pptx
PPTX
Exploratory Analysis On Play Store Apps.pptx
PDF
Stephanie Vanroelen - Mobile Anti-Virus apps exposed
PDF
Avtest 2012 02-android_anti-malware_report_english
PDF
Understanding Web App Testing_ A Detailed Guide for Developers and QA Teams.p...
PDF
App Testing Tools and Frameworks A Comparative Analysis.pdf
PPTX
OWSAP Zap Tool Execution - API Security Scan
PDF
Easy & Step-By-Step Ways of Finding Bugs in Software.pdf
PPTX
Getting Paid To Test Apps with your Mobile
DOCX
Discovery of ranking fraud for mobile apps
PDF
Testing parallel programs
Why Mobile App Penetration Testing Matters.pdf
App Store Optimization Tips 101
Detecting malicious facebook applications
Detecting malicious facebook applications
DETECTING MALICIOUS FACEBOOK APPLICATIONS - IEEE PROJECTS IN PONDICHERRY,BUL...
Detecting malicious facebook applicationsi
Android Malware Detection Using Genetic Algorithm.docx
Hidden Speed Bumps on the Road to "Continuous"
Malware Detection using ML Malware Detection using ml
Exploratory Analysis On Play Store Apps.pptx
Exploratory Analysis On Play Store Apps.pptx
Stephanie Vanroelen - Mobile Anti-Virus apps exposed
Avtest 2012 02-android_anti-malware_report_english
Understanding Web App Testing_ A Detailed Guide for Developers and QA Teams.p...
App Testing Tools and Frameworks A Comparative Analysis.pdf
OWSAP Zap Tool Execution - API Security Scan
Easy & Step-By-Step Ways of Finding Bugs in Software.pdf
Getting Paid To Test Apps with your Mobile
Discovery of ranking fraud for mobile apps
Testing parallel programs
Ad

More from Ahmed Kamel Taha (19)

PDF
Beyond vegetarianism
DOCX
5 spy devices
DOCX
5 spy software
DOCX
PRINCIPLES OF SOFTWARE ARCHITECTURE
PPTX
Owasp & php
DOCX
Exam quistions
DOCX
Questions
DOCX
DOCX
Software Requirements (3rd Edition) summary
PPTX
Distributed voting system
PDF
Owasp & php
PPTX
Functional reactive programming
PPTX
Design patterns
PPTX
Tcp congestion avoidance
PPTX
Offline db
PPTX
Secure mobile payment
PPTX
Week 6 planning
PPTX
[Software Requirements] Chapter 20: Agile Projects
Beyond vegetarianism
5 spy devices
5 spy software
PRINCIPLES OF SOFTWARE ARCHITECTURE
Owasp & php
Exam quistions
Questions
Software Requirements (3rd Edition) summary
Distributed voting system
Owasp & php
Functional reactive programming
Design patterns
Tcp congestion avoidance
Offline db
Secure mobile payment
Week 6 planning
[Software Requirements] Chapter 20: Agile Projects
Ad

Recently uploaded (20)

PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPT
Teaching material agriculture food technology
PDF
Encapsulation theory and applications.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
A Presentation on Artificial Intelligence
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPTX
cloud_computing_Infrastucture_as_cloud_p
Accuracy of neural networks in brain wave diagnosis of schizophrenia
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Mobile App Security Testing_ A Comprehensive Guide.pdf
OMC Textile Division Presentation 2021.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Reach Out and Touch Someone: Haptics and Empathic Computing
TLE Review Electricity (Electricity).pptx
Getting Started with Data Integration: FME Form 101
Spectral efficient network and resource selection model in 5G networks
Digital-Transformation-Roadmap-for-Companies.pptx
Teaching material agriculture food technology
Encapsulation theory and applications.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Spectroscopy.pptx food analysis technology
A comparative analysis of optical character recognition models for extracting...
A Presentation on Artificial Intelligence
Encapsulation_ Review paper, used for researhc scholars
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
cloud_computing_Infrastucture_as_cloud_p

Mining apps for anomalies

  • 1. Mining apps for anomalies P e r s p e c t i v e s o n D a t a S c i e n c e f o r S o f t w a r e E n g i n e e r i n g
  • 2. Agenda 2 • Specifications • APP MINING • DETECTING ABNORMAL BEHAVIOR • CHABADA • TREASURE OF DATA • OBSTACLES
  • 3. Specifications • Does the program do what it is supposed to do? • Will it continue to do so in the future? • How to define what its supposed to do? 3 Formal Methods
  • 4. Flappy Bird • Your aim is to move a little bird up and down such that it does not hit an obstacle. • As a developer you list undesired properties (no crash, no spying). • How to specify gameplay to computer? • Can we teach a computer how to check a program against expectations? • Learn what program behavior is normal in a given context? 4
  • 5. APP MINING 5 App mining leverages common knowledge in thousands of apps to automatically learn what is “normal” behavior— and in contrast, automatically identify “abnormal” behavior.
  • 6. APP MINING • Leverage the knowledge encoded into the hundreds of thousands of apps available in app stores • Determine what would be normal behavior, to detect what would be abnormal behavior • Guide programmers and users toward better security and usability A p p s i n a p p s t o r e s h a v e t h r e e f e a t u r e s 1. Apps come with all sorts of metadata, such as names, categories, and user interfaces. All of these can be associated with program features, so you can, for instance, associate program behavior with descriptions. 2. Apps are pretty much uniform. They use the same libraries, which on top, use fairly recent designs. All this makes apps easy to analyze, execute, and test—and consequently, easy to compare. 3. Apps are redundant. There are plenty of apps that all address similar problems. This is in sharp contrast to open source programs.. This redundancy in apps allows us to learn common patterns of how problems are addressed—and, in return, detect anomalies. 6
  • 7. DETECTING ABNORMAL BEHAVIOR The problem with “normal” behavior is that it varies according to the app’s purpose.: • If an app sends out text messages, that would normally be a sign of malicious behavior—unless it is a messaging application, where sending text messages is one of the advertised features. • If an app continuously monitors your position, this might be malicious behavior—unless it is a tracking app that again advertises this as a feature. • Simply checking for a set of predefined “undesired” features is not enough—if the features are clearly advertised, then it is reasonable to assume the user tolerates, or even wants these features, because otherwise, she would not have chosen the app. 7
  • 8. 8 Introducing CHABADA • To determine what is normal, we thus must assess program behavior together with its description. If the behavior is advertised then it’s fine; if not, it may come as a surprise to the user, and thus should be flagged. • This is the idea we followed in our first app mining work, the CHABADA tool. • A general tool to detect mismatches between the behavior of an app and its description • Applied on a set of 22,500 apps, CHABADA can detect 74% of novel malware, with a false positive rate below 10%. • Our recent MUDFLOW prototype, which learns normal data flows from apps, can even detect more than 90% of novel malware leaking sensitive data. “Checking App Behavior Against Descriptions of Apps”
  • 9. CHABADA • CHABADA starts with a (large) set of apps to be analyzed. • It first applies tried-and-proven natural language processing techniques (stemming, LDA (Latent Dirichlet Analysis), topic analysis) to abstract the app descriptions into topics. • It builds clusters of those apps whose topics have the most in common. Thus, all apps whose descriptions refer to messaging end up in a “Messaging” cluster. 9
  • 10. 10 CHABADA • Within each cluster, CHABADA will now search for outliers regarding app behavior. • Simply use the set of API calls contained in each app; these are easy to extract using simple static analysis tools. • CHABADA uses tried-and-proven outlier analysis techniques, which provide a ranking of the apps in a cluster, depending on how far away their API usage is from the norm. Those apps that are ranked highest are the most likely outliers.
  • 11. 11 A TREASURE OF DATA … 1. Future techniques will tie program analysis to user interface analysis. 2. Mining user interaction may reveal behavior patterns we could reuse in various contexts. 3. Violating behavior patterns may also imply usability issues. If a button named “Login” does nothing, for instance, it would be very different from the other “Login” buttons used in other apps—and hopefully be flagged as an anomaly. 4. Given good test generators, one can systematically explore the dynamic behavior, and gain information on concrete text and resources accessed a n u mb er of id eas th at ap p stores all make p ossib le
  • 12. OBSTACLES 1. Getting apps is not hard, but not easy either. Besides the official stores, there is no publicly available repository of apps where you could simply download thousands of apps, because violation of copyright. 2. For apps, there’s no easily accessible source code, version, or bug information. If you monitor a store for a sufficient time, you may be able to access and compare releases, but that’s it. Vendors not going to help you and open source is limited . Fortunately, app byte code is not too hard to get through. 3. Metadata is only a very weak indicator of program quality. Lots of one-star reviews may refer to a recent price increase or political reasons; but reviews talking about crashes or malicious behavior might give clear signs. 4. Never underestimate developers. Vendors typically have a pretty clear picture of what their users do, If you think you can mine metadata to predict release dates, reviews, or sentiments: talk to vendors first and check your proposal against the realities of app development.