SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 8053
Android Malware Detection using Deep Learning
Devi K.R1
Student, Dept. of CSE, College of Engineering Trivandrum, Kerala, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Mobile devices are prone to malware attacks.
Many systems have been implemented to prevent these
attacks but none are fruitful. The implemented system is a
machine learning based malware detection framework
which is used to protect the Android devices from major
security threats. A large collection of dataset is used for
training from which requested permissions are extracted.
Based on these extracted permissions, a model is developed
using the dataset and is tested using unknown malware and
benign app samples.
Key Words: Machine Learning, Malware detection,
Permissions
1. INTRODUCTION
Malware is a software which can cause potential threats to a
computer, server, client, or computer network. Malware
causes damage after it is implanted or introduced into a
target's computer and it is in the form of an executable
codes, script files and other softwares. The codes are known
as viruses, ransomware, spyware, adware, worms , Trojans
,scareware and many other forms. The commonly used
methods for protecting against malware is to prevent the
software from gainingaccesstothetargetcomputerincludes
antivirus software, firewallsandmanyothertechniques. The
main uses of them include preventing their access to target
computers, checking the presence of suspicious activities,
recover from malware attacks. Another strategy to
differentiate malware apps from genuine Android apps
includes sophisticated dynamic and static analysis tools to
detect and classify malicious apps automatically. There are
encryption techniques which will decrease the chances of
malwares from being detected. To avoid this problem, we
can study Android apps to extract permissions which are
sensitive that are widely used in Android malwares. An
automated malware detection system isusedtofightagainst
malwares and assist Android appmarketplacestodetectand
remove unknown malicious apps.
Static analysis tools are used to extract source codes or byte
codes, often traversing the paths of programs to check for
some unique and hidden resources. Static analysis
approaches are used for different tasks which includes the
behaviour assessment of Android apps, detection of
application clones, automatic test case generations, or for
uncovering non functional issues related to performance.
The important point which is to be noted is that the code is
not executed or run but the tool itself isexecuted.Thesource
code is the input to the tool and the mined features are the
output.eg:-Drebin
Dynamic program analysis is the analysis of Android
applications by executing the programs on a virtual
environment like Android Studio. The target programsmust
be executed with test inputs toproducethebehavior.System
calls are analyzed to monitor the behaviour of Android
applications.eg:-TaintDroid
Malware classification is an open problem commonly
rectified by employing machine learning techniques.
Permissions and API calls are extracted w Man is able to
detect behaviors which are sensitive from Android
applications. Most of the detections are based on the
difference of permissions detected by benign apps and
malware apps. By analysing the permissions requested and
api call usages, benign app and malware app samples can
effectively expose abnormal behaviors and finally
distinguish malware from many genuine applications.
So considering the drawbacks of the above techniques we
propose a new model which is based on the extracted
permissions from the apks and uses deep learning
techniques to formulate the model.
2. SYSTEM DESIGN
Most of the malware detection tools uses the manual of lists
of features based on permissions, api calls, sensitive
resources, intents, etc., which are difficult to come by. To
address this problem, we study the different real Android
applications to mine hidden patterns of malware and are
able to extract highly sensitive permissions that are widely
used in Android malware.
Benign apps are downloaded from apkpure.com which is a
free site of benign apks from google playstore. Malicious
apps are downloadedandareextractedfromvirusShare.com
and Contagio Mini Dump. Features like Api related
Permissions are considered to develop the system.
Permission Distribution
Permissions[1] from malwares and benign apps are
identified. By analysis, Access_wifi_state,SendSms etc are
commonly used by malwares. The requested permissions of
the android applications are declared in a file calledAndroid
manifest of the respective apks. From the manifest files,
permissions are extracted and are converted to a csv file. A
large number of permissions are identified in the previous
step. Out of which a few must be selected for further
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 8054
processing. For that Mann Whitney test[2] is employed. For
each permission, if a particular app uses that permission,the
corresponding permission is set as 1 or else it is set to 0.
These values are indicatedbypvalues.[2]Thereforetwosets
of samples are generated. One to represent one specific
permission usage of malicious apps and the other to
represent specific permission usage of benign apps. In the
previously created input file, a comparison test is applied.
For each permission, the average values are computed for
each of the feature vector. So from two sets of samples, we
compute the average values. And those permissions with
higher average values will be selected as the feature vector
for training.
2.1 Malware Detection
This feature vector is divided into two. One can be used for
training the model and the othercanbeusedfordetermining
the model parameters. The first feature vector is fed to the
classifier. The classifier employed here is the Neural
Networks and K-Means Clustering Algorithm. Two trained
models will be created. The second feature vector is given as
input to the model to determine the model parameters like
accuracy, precision, recall etc. Unknown apks are then given
as input to the model so that the model will predict these
apks as benign or malicious.
3. IMPLEMENTATION
Here, we take a closer look at how the system was
implemented. The whole system was developed using
python language.
Benign apps are downloaded from apkpure.com. Malicious
apps are downloaded and extracted from virusshare.com
and Contagio Minidump. A total of 135 benign apps were
collected. A total of 327 malicious apps were collected. The
features namely permissions are extracted using Python 3.7
in Spyder. A package called Androguard[5] is used toextract
manifest files from apks. The extracted permissions are
correctly displayed on the screen. Feature Selection is done
using Extra Tree Classifier which is included as a built in
package in python. Feature selection is performed
successfully using the dataset.
Feature Vectors are generated by using Mann-Whitney
test[3]. It is implemented using the inbuilt package called
scipy.stats in Python 3.7. The weights and their
corresponding feature names are written to a csv file.
Training phase receives a training dataset which is a csv file.
The model is trained using Neural networks and k-means
clustering algorithm. Output of this phase is a confusion
matrix and graphs showing the dataset which are classified
correctly and incorrectly. The model is testedusingdifferent
samples of both malware and benign apps. The output ofthe
feature map as well as the prediction will be printed on the
screen.
The feature map generated for the training data sample is
given in the figure below:-
Figure 1: Feature Map of the training dataset
The extracted permissions from the testing data sample is
shown in the figure below:-
Figure 2: Extracted Features from the test dataset
The feature map of the test data sample and the prediction
is shown in the figure below:-
Figure 3: Feature Map of test data
4. PERFORMANCE ANALYSIS
Performance Analysis deals with the measurement of
response time, Correctness of output and throughput of the
proposed software.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 8055
A confusion matrix[7] is a table which is used to describethe
performance of a classification model when a set of test data
values are given as input to the trained system.Theaccuracy
of the system gives us an overview of how accurate the
system is when test samples are passed through it. The
accuracy of the model is 88%. The accuracy is less because
this system will bypass malwares using encryption
techniques[4] and java reflection to encrypt source codes.
This system is vulnerable to pollution attacks[5] which
means malwaresrequestpermissionsnormallyrequested by
benign app samples to avoid detection. The accuracy of the
system is shown in the figure below.
Figure 4: Confusion Matrix and accuracy
When compared to other techniques, this accuracy rate is
very high. This model produces a good result when
compared to Drebin[2], Taintdroid[3] and other Machine
Learning techniques.
5. CONCLUSIONS
The implemented system collects datas in the form of
Android apks from various Internet sources. The apks are
extracted to collect features whicharebasicallypermissions.
A feature vector is created based on the permissions andthe
given apks. This is the input to the ML algorithms to build a
trained model. Unknown applications are used as input. An
overall accuracy of 88 percent is achieved.
The main limitations of the model include:-
 A large dataset must be collected to avoid
overfitting problem.
 The extracted permissions are limited because the
number of malicious applications on the internet
are very less.
 This system considers the differences of malware
and benign apps but it does not consider the
categories of benign apps which can be useful for
malware detection.
 This system is open to Mimicry and Pollution
attacks.
 This system bypass malwares using Java Reection
and bytecode encryption.
REFERENCES
[1] G. Tao, Z. Zheng, Z. Guo and M. R. Lyu, MalPat: Mining
Patterns of Malicious and Benign Android Apps via
Permission-Related APIs",in IEEE Transactions on
Reliability, vol. 67, no. 1, pp. 355-369, March 2018.
[2] K. Xu, Y. Li and R. H. Deng, ICCDetector: ICC-Based
Malware Detection on Android,"inIEEETransactionson
Information Forensics and Security, vol. 11, no 6, pp.
1252-1264, June 2016.
[3] L. Cen, C. S. Gates, L. Si and N. Li, "A Probabilistic
Discriminative Model for Android Malware Detection
with Decompiled Source Code," in IEEE Transactionson
Dependable and Secure Computing, vol. 12, no. 4, pp.
400-412, 1 July-Aug. 2015.
[4] B. Rashidi, C. Fung and E. Bertino,"Android malicious
application detection using support vector machineand
active learning," 13th International Conference on
Network andServiceManagement(CNSM),Tokyo,2017,
pp. 1-9.
[5] N. Peiravian and X. Zhu,"Machine Learning for Android
Malware Detection UsingPermissionandAPICalls"IEEE
25th International Conference on Tools with Arti_cial
[6] https://www.python.org/downloads/ [Accessed on
16/4/2019].
[7] https://www.geeksforgeeks.org/ [Accessed on
21/3/2019].

More Related Content

PDF
IRJET - Research on Data Mining of Permission-Induced Risk for Android Devices
PDF
DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...
PDF
Permission Driven Malware Detection using Machine Learning
PDF
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
PDF
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
PDF
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
PDF
IRJET - Survey on Malware Detection using Deep Learning Methods
PDF
Androinspector a system for
IRJET - Research on Data Mining of Permission-Induced Risk for Android Devices
DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...
Permission Driven Malware Detection using Machine Learning
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
IRJET - Survey on Malware Detection using Deep Learning Methods
Androinspector a system for

What's hot (20)

PDF
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
PPTX
Pindroid - Android Malware Detection Tool
PPTX
COVERT app
PDF
Android Malware Detection in Official and Third Party Application Stores
PDF
IRJET-A Review of Testing Technology in Web Application System
PDF
A Survey on Bug Tracking System for Effective Bug Clearance
PDF
IRJET - Heuristic Approach to Intrusion Detection System
PDF
A FRAMEWORK FOR ANALYSIS AND COMPARISON OF DYNAMIC MALWARE ANALYSIS TOOLS
PDF
IRJET- Web Application Firewall: Artificial Intelligence ARC
PDF
IRJET- Intrusion Detection System using Genetic Algorithm
PDF
Cv32608610
PDF
Android malware detection_using_autoenco (1)
PDF
Appendix g iocs readme
PDF
IRJET- A Review on Application of Data Mining Techniques for Intrusion De...
PDF
Genetic algorithm based approach for
PPTX
Web applications security conference slides
PDF
Android Malware: Study and analysis of malware for privacy leak in ad-hoc net...
PDF
Parameter Estimation of GOEL-OKUMOTO Model by Comparing ACO with MLE Method
PDF
IEEE ANDROID APPLICATION 2016 TITLE AND ABSTRACT
PPT
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
Pindroid - Android Malware Detection Tool
COVERT app
Android Malware Detection in Official and Third Party Application Stores
IRJET-A Review of Testing Technology in Web Application System
A Survey on Bug Tracking System for Effective Bug Clearance
IRJET - Heuristic Approach to Intrusion Detection System
A FRAMEWORK FOR ANALYSIS AND COMPARISON OF DYNAMIC MALWARE ANALYSIS TOOLS
IRJET- Web Application Firewall: Artificial Intelligence ARC
IRJET- Intrusion Detection System using Genetic Algorithm
Cv32608610
Android malware detection_using_autoenco (1)
Appendix g iocs readme
IRJET- A Review on Application of Data Mining Techniques for Intrusion De...
Genetic algorithm based approach for
Web applications security conference slides
Android Malware: Study and analysis of malware for privacy leak in ad-hoc net...
Parameter Estimation of GOEL-OKUMOTO Model by Comparing ACO with MLE Method
IEEE ANDROID APPLICATION 2016 TITLE AND ABSTRACT
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
Ad

Similar to IRJET- Android Malware Detection using Deep Learning (20)

PDF
IRJET- Android Malware Detection using Machine Learning
PDF
Permission based Android Malware Detection using Random Forest
PDF
IRJET- Android Malware Detection System
PPTX
MALWARE DETECTION A FRAMEWORK FOR REVERSE ENGINEERED ANDROID APPLICATIONS_.pptx
PPTX
Predict Android ransomware using categorical classifiaction.pptx
PDF
IRJET- Effective Technique Used for Malware Detection using Machine Learning
PDF
Android Malware Detection
PDF
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
PPTX
NYIT research on malware detection in android devices
PDF
MACHINE LEARNING APPROACH TO LEARN AND DETECT MALWARE IN ANDROID
PDF
I03402059063
DOCX
Review of behavior malware analysis for android
DOCX
Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...
PDF
malware detection-machine learning-reverse engineered.ppt
PDF
ANDROID UNTRUSTED DETECTION WITH PERMISSION BASED SCORING ANALYSIS
PDF
ANDROID UNTRUSTED DETECTION WITH PERMISSION BASED SCORING ANALYSIS
PDF
Effective classification of android malware families through dynamic features...
PDF
Icacci presentation- deep android
PDF
Android Malware Detection Literature Review
PDF
Fisher exact Boschloo and polynomial vector learning for malware detection
IRJET- Android Malware Detection using Machine Learning
Permission based Android Malware Detection using Random Forest
IRJET- Android Malware Detection System
MALWARE DETECTION A FRAMEWORK FOR REVERSE ENGINEERED ANDROID APPLICATIONS_.pptx
Predict Android ransomware using categorical classifiaction.pptx
IRJET- Effective Technique Used for Malware Detection using Machine Learning
Android Malware Detection
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
NYIT research on malware detection in android devices
MACHINE LEARNING APPROACH TO LEARN AND DETECT MALWARE IN ANDROID
I03402059063
Review of behavior malware analysis for android
Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...
malware detection-machine learning-reverse engineered.ppt
ANDROID UNTRUSTED DETECTION WITH PERMISSION BASED SCORING ANALYSIS
ANDROID UNTRUSTED DETECTION WITH PERMISSION BASED SCORING ANALYSIS
Effective classification of android malware families through dynamic features...
Icacci presentation- deep android
Android Malware Detection Literature Review
Fisher exact Boschloo and polynomial vector learning for malware detection
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
PPT on Performance Review to get promotions
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PPT
Mechanical Engineering MATERIALS Selection
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Welding lecture in detail for understanding
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPT
Project quality management in manufacturing
PPTX
Sustainable Sites - Green Building Construction
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
DOCX
573137875-Attendance-Management-System-original
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPT on Performance Review to get promotions
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Internet of Things (IOT) - A guide to understanding
Mechanical Engineering MATERIALS Selection
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Model Code of Practice - Construction Work - 21102022 .pdf
CH1 Production IntroductoryConcepts.pptx
R24 SURVEYING LAB MANUAL for civil enggi
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Welding lecture in detail for understanding
UNIT 4 Total Quality Management .pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Project quality management in manufacturing
Sustainable Sites - Green Building Construction
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
573137875-Attendance-Management-System-original
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks

IRJET- Android Malware Detection using Deep Learning

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 8053 Android Malware Detection using Deep Learning Devi K.R1 Student, Dept. of CSE, College of Engineering Trivandrum, Kerala, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Mobile devices are prone to malware attacks. Many systems have been implemented to prevent these attacks but none are fruitful. The implemented system is a machine learning based malware detection framework which is used to protect the Android devices from major security threats. A large collection of dataset is used for training from which requested permissions are extracted. Based on these extracted permissions, a model is developed using the dataset and is tested using unknown malware and benign app samples. Key Words: Machine Learning, Malware detection, Permissions 1. INTRODUCTION Malware is a software which can cause potential threats to a computer, server, client, or computer network. Malware causes damage after it is implanted or introduced into a target's computer and it is in the form of an executable codes, script files and other softwares. The codes are known as viruses, ransomware, spyware, adware, worms , Trojans ,scareware and many other forms. The commonly used methods for protecting against malware is to prevent the software from gainingaccesstothetargetcomputerincludes antivirus software, firewallsandmanyothertechniques. The main uses of them include preventing their access to target computers, checking the presence of suspicious activities, recover from malware attacks. Another strategy to differentiate malware apps from genuine Android apps includes sophisticated dynamic and static analysis tools to detect and classify malicious apps automatically. There are encryption techniques which will decrease the chances of malwares from being detected. To avoid this problem, we can study Android apps to extract permissions which are sensitive that are widely used in Android malwares. An automated malware detection system isusedtofightagainst malwares and assist Android appmarketplacestodetectand remove unknown malicious apps. Static analysis tools are used to extract source codes or byte codes, often traversing the paths of programs to check for some unique and hidden resources. Static analysis approaches are used for different tasks which includes the behaviour assessment of Android apps, detection of application clones, automatic test case generations, or for uncovering non functional issues related to performance. The important point which is to be noted is that the code is not executed or run but the tool itself isexecuted.Thesource code is the input to the tool and the mined features are the output.eg:-Drebin Dynamic program analysis is the analysis of Android applications by executing the programs on a virtual environment like Android Studio. The target programsmust be executed with test inputs toproducethebehavior.System calls are analyzed to monitor the behaviour of Android applications.eg:-TaintDroid Malware classification is an open problem commonly rectified by employing machine learning techniques. Permissions and API calls are extracted w Man is able to detect behaviors which are sensitive from Android applications. Most of the detections are based on the difference of permissions detected by benign apps and malware apps. By analysing the permissions requested and api call usages, benign app and malware app samples can effectively expose abnormal behaviors and finally distinguish malware from many genuine applications. So considering the drawbacks of the above techniques we propose a new model which is based on the extracted permissions from the apks and uses deep learning techniques to formulate the model. 2. SYSTEM DESIGN Most of the malware detection tools uses the manual of lists of features based on permissions, api calls, sensitive resources, intents, etc., which are difficult to come by. To address this problem, we study the different real Android applications to mine hidden patterns of malware and are able to extract highly sensitive permissions that are widely used in Android malware. Benign apps are downloaded from apkpure.com which is a free site of benign apks from google playstore. Malicious apps are downloadedandareextractedfromvirusShare.com and Contagio Mini Dump. Features like Api related Permissions are considered to develop the system. Permission Distribution Permissions[1] from malwares and benign apps are identified. By analysis, Access_wifi_state,SendSms etc are commonly used by malwares. The requested permissions of the android applications are declared in a file calledAndroid manifest of the respective apks. From the manifest files, permissions are extracted and are converted to a csv file. A large number of permissions are identified in the previous step. Out of which a few must be selected for further
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 8054 processing. For that Mann Whitney test[2] is employed. For each permission, if a particular app uses that permission,the corresponding permission is set as 1 or else it is set to 0. These values are indicatedbypvalues.[2]Thereforetwosets of samples are generated. One to represent one specific permission usage of malicious apps and the other to represent specific permission usage of benign apps. In the previously created input file, a comparison test is applied. For each permission, the average values are computed for each of the feature vector. So from two sets of samples, we compute the average values. And those permissions with higher average values will be selected as the feature vector for training. 2.1 Malware Detection This feature vector is divided into two. One can be used for training the model and the othercanbeusedfordetermining the model parameters. The first feature vector is fed to the classifier. The classifier employed here is the Neural Networks and K-Means Clustering Algorithm. Two trained models will be created. The second feature vector is given as input to the model to determine the model parameters like accuracy, precision, recall etc. Unknown apks are then given as input to the model so that the model will predict these apks as benign or malicious. 3. IMPLEMENTATION Here, we take a closer look at how the system was implemented. The whole system was developed using python language. Benign apps are downloaded from apkpure.com. Malicious apps are downloaded and extracted from virusshare.com and Contagio Minidump. A total of 135 benign apps were collected. A total of 327 malicious apps were collected. The features namely permissions are extracted using Python 3.7 in Spyder. A package called Androguard[5] is used toextract manifest files from apks. The extracted permissions are correctly displayed on the screen. Feature Selection is done using Extra Tree Classifier which is included as a built in package in python. Feature selection is performed successfully using the dataset. Feature Vectors are generated by using Mann-Whitney test[3]. It is implemented using the inbuilt package called scipy.stats in Python 3.7. The weights and their corresponding feature names are written to a csv file. Training phase receives a training dataset which is a csv file. The model is trained using Neural networks and k-means clustering algorithm. Output of this phase is a confusion matrix and graphs showing the dataset which are classified correctly and incorrectly. The model is testedusingdifferent samples of both malware and benign apps. The output ofthe feature map as well as the prediction will be printed on the screen. The feature map generated for the training data sample is given in the figure below:- Figure 1: Feature Map of the training dataset The extracted permissions from the testing data sample is shown in the figure below:- Figure 2: Extracted Features from the test dataset The feature map of the test data sample and the prediction is shown in the figure below:- Figure 3: Feature Map of test data 4. PERFORMANCE ANALYSIS Performance Analysis deals with the measurement of response time, Correctness of output and throughput of the proposed software.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 8055 A confusion matrix[7] is a table which is used to describethe performance of a classification model when a set of test data values are given as input to the trained system.Theaccuracy of the system gives us an overview of how accurate the system is when test samples are passed through it. The accuracy of the model is 88%. The accuracy is less because this system will bypass malwares using encryption techniques[4] and java reflection to encrypt source codes. This system is vulnerable to pollution attacks[5] which means malwaresrequestpermissionsnormallyrequested by benign app samples to avoid detection. The accuracy of the system is shown in the figure below. Figure 4: Confusion Matrix and accuracy When compared to other techniques, this accuracy rate is very high. This model produces a good result when compared to Drebin[2], Taintdroid[3] and other Machine Learning techniques. 5. CONCLUSIONS The implemented system collects datas in the form of Android apks from various Internet sources. The apks are extracted to collect features whicharebasicallypermissions. A feature vector is created based on the permissions andthe given apks. This is the input to the ML algorithms to build a trained model. Unknown applications are used as input. An overall accuracy of 88 percent is achieved. The main limitations of the model include:-  A large dataset must be collected to avoid overfitting problem.  The extracted permissions are limited because the number of malicious applications on the internet are very less.  This system considers the differences of malware and benign apps but it does not consider the categories of benign apps which can be useful for malware detection.  This system is open to Mimicry and Pollution attacks.  This system bypass malwares using Java Reection and bytecode encryption. REFERENCES [1] G. Tao, Z. Zheng, Z. Guo and M. R. Lyu, MalPat: Mining Patterns of Malicious and Benign Android Apps via Permission-Related APIs",in IEEE Transactions on Reliability, vol. 67, no. 1, pp. 355-369, March 2018. [2] K. Xu, Y. Li and R. H. Deng, ICCDetector: ICC-Based Malware Detection on Android,"inIEEETransactionson Information Forensics and Security, vol. 11, no 6, pp. 1252-1264, June 2016. [3] L. Cen, C. S. Gates, L. Si and N. Li, "A Probabilistic Discriminative Model for Android Malware Detection with Decompiled Source Code," in IEEE Transactionson Dependable and Secure Computing, vol. 12, no. 4, pp. 400-412, 1 July-Aug. 2015. [4] B. Rashidi, C. Fung and E. Bertino,"Android malicious application detection using support vector machineand active learning," 13th International Conference on Network andServiceManagement(CNSM),Tokyo,2017, pp. 1-9. [5] N. Peiravian and X. Zhu,"Machine Learning for Android Malware Detection UsingPermissionandAPICalls"IEEE 25th International Conference on Tools with Arti_cial [6] https://www.python.org/downloads/ [Accessed on 16/4/2019]. [7] https://www.geeksforgeeks.org/ [Accessed on 21/3/2019].