SlideShare a Scribd company logo
2
Most read
3
Most read
5
Most read
SHREYA GOPAL SUNDARI
PHISHING WEBSITE DETECTION
by MACHINE LEARNING
TECHNIQUES
INTRODUCTION
• Phishing is the most commonly used social engineering and cyber attack.
• Through such attacks, the phisher targets naïve online users by tricking them into
revealing confidential information, with the purpose of using it fraudulently.
• In order to avoid getting phished,
• users should have awareness of phishing websites.
• have a blacklist of phishing websites which requires the knowledge of website being detected
as phishing.
• detect them in their early appearance, using machine learning and deep neural network
algorithms.
• Of the above three, the machine learning based method is proven to be most
effective than the other methods.
• Even then, online users are still being trapped into revealing sensitive information in
phishing websites.
OBJECTIVES
A phishing website is a common social engineering method that mimics
trustful uniform resource locators (URLs) and webpages. The objective of this
project is to train machine learning models and deep neural nets on the dataset
created to predict phishing websites. Both phishing and benign URLs of websites
are gathered to form a dataset and from them required URL and website
content-based features are extracted. The performance level of each model is
measures and compared.
APPROACH
Below mentioned are the steps involved in the completion of this project:
• Collect dataset containing phishing and legitimate websites from the open source platforms.
• Write a code to extract the required features from the URL database.
• Analyze and preprocess the dataset by using EDA techniques.
• Divide the dataset into training and testing sets.
• Run selected machine learning and deep neural network algorithms like SVM, Random Forest,
Autoencoder on the dataset.
• Write a code for displaying the evaluation result considering accuracy metrics.
• Compare the obtained results for trained models and specify which is better.
DATA COLLECTION
• Legitimate URLs are collected from the dataset provided by University of
New Brunswick, https://www.unb.ca/cic/datasets/url-2016.html.
• From the collection, 5000 URLs are randomly picked.
• Phishing URLs are collected from opensource service called PhishTank . This
service provide a set of phishing URLs in multiple formats like csv, json etc.
that gets updated hourly.
• Form the obtained collection, 5000 URLs are randomly picked.
FEATURE SELECTION
• The following category of features are selected:
• Address Bar based Features
• Domain based Features
• HTML & Javascript based Feature
• Address Bar based Features considered are:
• Domian of URL • Redirection ‘//’ in URL
• IP Address in URL • ‘http/https’ in Domain name
• ‘@’ Symbol in URL • Using URL Shortening Service
• Length of URL • Prefix or Suffix "-" in Domain
• Depth of URL
FEATURE SELECTION (CONT.)
• Domain based Features considered are:
• HTML and JavaScript based Features considered are:
• All together 17 features are extracted from the dataset.
• DNS Record • Age of Domain
• Website Traffic • End Period of Domain
• Iframe Redirection • Disabling Right Click
• Status Bar Customization • Website Forwarding
FEATURES DISTRIBUTION
MACHINE LEARNING MODELS
• This is a supervised machine learning task. There are two major types of supervised
machine learning problems, called classification and regression.
• This data set comes under classification problem, as the input URL is classified as
phishing (1) or legitimate (0). The machine learning models (classification) considered
to train the dataset in this notebook are:
• Decision Tree
• Random Forest
• Multilayer Perceptrons
• XGBoost
• Autoencoder Neural Network
• Support Vector Machines
MODEL EVALUATION
• The models are evaluated, and the considered metric is accuracy.
• Below Figure shows the training and test dataset accuracy by the respective models:
• For the above it is clear that the XGBoost model gives better performance. The model is
saved for further usage.
NEXT STEPS
• Working on this project is very knowledgeable and worth the effort.
• Through this project, one can know a lot about the phishing websites and how
they are differentiated from legitimate ones.
• This project can be taken further by creating a browser extensions of
developing a GUI.
• These should classify the inputted URL to legitimate or phishing with the use of
the saved model.
Thank You…..

More Related Content

PPTX
Malware Detection By Machine Learning Presentation.pptx
PPTX
Phishing Detection using Machine Learning
ODP
Malware Dectection Using Machine learning
PPTX
PHISHING DETECTION
PPTX
Malware Detection Using Machine Learning Techniques
PPTX
Detection of Phishing Websites
PPTX
Intrusion Detection with Neural Networks
PPT
Malware Detection using Machine Learning
Malware Detection By Machine Learning Presentation.pptx
Phishing Detection using Machine Learning
Malware Dectection Using Machine learning
PHISHING DETECTION
Malware Detection Using Machine Learning Techniques
Detection of Phishing Websites
Intrusion Detection with Neural Networks
Malware Detection using Machine Learning

What's hot (20)

PPTX
Detection of phishing websites
PDF
Web Application Security 101
PDF
CSRF, ClickJacking & Open Redirect
PPTX
Email hacking
PPT
Phishing detection & protection scheme
PPTX
Phishing ppt
PDF
Secure Design: Threat Modeling
PPTX
Phishing techniques
PPTX
Xss attack
PPTX
Malware analysis
PPTX
Types of attacks
PPTX
Nessus-Vulnerability Tester
PDF
Email security presentation
PPTX
Packet sniffers
PPTX
Dos attack
PPTX
Email spam detection
PDF
Detecting Phishing using Machine Learning
PPTX
Encryption ppt
PPTX
Phishing attack, with SSL Encryption and HTTPS Working
PPTX
Detection of phishing websites
Web Application Security 101
CSRF, ClickJacking & Open Redirect
Email hacking
Phishing detection & protection scheme
Phishing ppt
Secure Design: Threat Modeling
Phishing techniques
Xss attack
Malware analysis
Types of attacks
Nessus-Vulnerability Tester
Email security presentation
Packet sniffers
Dos attack
Email spam detection
Detecting Phishing using Machine Learning
Encryption ppt
Phishing attack, with SSL Encryption and HTTPS Working
Ad

Similar to Phishing Website Detection by Machine Learning Techniques Presentation.pdf (20)

PPTX
phishingwebsiteppt -presentationartificial intelligence
PPTX
phishing website detection using ML teq
PDF
Detecting Phishing Websites Using Machine Learning
PDF
Phishing Website Detection Using Machine Learning
PPTX
dasdweda PPT.pptx
PDF
A Hybrid Approach For Phishing Website Detection Using Machine Learning.
PPTX
PHISHING URL DETECTION AND MALICIOUS LINK
PPTX
A Comparative Analysis of Different Feature Set on the Performance of Differe...
PDF
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
PDF
A COMPARATIVE ANALYSIS OF DIFFERENT FEATURE SET ON THE PERFORMANCE OF DIFFERE...
PPTX
Rootconf_phishing_v2
PDF
Phishing Detection using Decision Tree Model
PDF
Phishing Website Detection Using Machine Learning
PDF
A multi-algorithm approach for phishing uniform resource locator’s detection
PPTX
Artificial intelligence presentation slides.pptx
PPTX
Classification with R
PPTX
Final presentation of diabetic_retinopathy_vascular
PDF
Detection of Phishing Websites using machine Learning Algorithm
PDF
Phishing Website Detection Paradigm using XGBoost
PDF
PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHES
phishingwebsiteppt -presentationartificial intelligence
phishing website detection using ML teq
Detecting Phishing Websites Using Machine Learning
Phishing Website Detection Using Machine Learning
dasdweda PPT.pptx
A Hybrid Approach For Phishing Website Detection Using Machine Learning.
PHISHING URL DETECTION AND MALICIOUS LINK
A Comparative Analysis of Different Feature Set on the Performance of Differe...
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
A COMPARATIVE ANALYSIS OF DIFFERENT FEATURE SET ON THE PERFORMANCE OF DIFFERE...
Rootconf_phishing_v2
Phishing Detection using Decision Tree Model
Phishing Website Detection Using Machine Learning
A multi-algorithm approach for phishing uniform resource locator’s detection
Artificial intelligence presentation slides.pptx
Classification with R
Final presentation of diabetic_retinopathy_vascular
Detection of Phishing Websites using machine Learning Algorithm
Phishing Website Detection Paradigm using XGBoost
PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHES
Ad

Recently uploaded (20)

PPTX
Cell Types and Its function , kingdom of life
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Classroom Observation Tools for Teachers
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
01-Introduction-to-Information-Management.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
Cell Types and Its function , kingdom of life
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Classroom Observation Tools for Teachers
Microbial disease of the cardiovascular and lymphatic systems
O5-L3 Freight Transport Ops (International) V1.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
human mycosis Human fungal infections are called human mycosis..pptx
VCE English Exam - Section C Student Revision Booklet
Supply Chain Operations Speaking Notes -ICLT Program
FourierSeries-QuestionsWithAnswers(Part-A).pdf
01-Introduction-to-Information-Management.pdf
RMMM.pdf make it easy to upload and study
Module 4: Burden of Disease Tutorial Slides S2 2025

Phishing Website Detection by Machine Learning Techniques Presentation.pdf

  • 1. SHREYA GOPAL SUNDARI PHISHING WEBSITE DETECTION by MACHINE LEARNING TECHNIQUES
  • 2. INTRODUCTION • Phishing is the most commonly used social engineering and cyber attack. • Through such attacks, the phisher targets naïve online users by tricking them into revealing confidential information, with the purpose of using it fraudulently. • In order to avoid getting phished, • users should have awareness of phishing websites. • have a blacklist of phishing websites which requires the knowledge of website being detected as phishing. • detect them in their early appearance, using machine learning and deep neural network algorithms. • Of the above three, the machine learning based method is proven to be most effective than the other methods. • Even then, online users are still being trapped into revealing sensitive information in phishing websites.
  • 3. OBJECTIVES A phishing website is a common social engineering method that mimics trustful uniform resource locators (URLs) and webpages. The objective of this project is to train machine learning models and deep neural nets on the dataset created to predict phishing websites. Both phishing and benign URLs of websites are gathered to form a dataset and from them required URL and website content-based features are extracted. The performance level of each model is measures and compared.
  • 4. APPROACH Below mentioned are the steps involved in the completion of this project: • Collect dataset containing phishing and legitimate websites from the open source platforms. • Write a code to extract the required features from the URL database. • Analyze and preprocess the dataset by using EDA techniques. • Divide the dataset into training and testing sets. • Run selected machine learning and deep neural network algorithms like SVM, Random Forest, Autoencoder on the dataset. • Write a code for displaying the evaluation result considering accuracy metrics. • Compare the obtained results for trained models and specify which is better.
  • 5. DATA COLLECTION • Legitimate URLs are collected from the dataset provided by University of New Brunswick, https://www.unb.ca/cic/datasets/url-2016.html. • From the collection, 5000 URLs are randomly picked. • Phishing URLs are collected from opensource service called PhishTank . This service provide a set of phishing URLs in multiple formats like csv, json etc. that gets updated hourly. • Form the obtained collection, 5000 URLs are randomly picked.
  • 6. FEATURE SELECTION • The following category of features are selected: • Address Bar based Features • Domain based Features • HTML & Javascript based Feature • Address Bar based Features considered are: • Domian of URL • Redirection ‘//’ in URL • IP Address in URL • ‘http/https’ in Domain name • ‘@’ Symbol in URL • Using URL Shortening Service • Length of URL • Prefix or Suffix "-" in Domain • Depth of URL
  • 7. FEATURE SELECTION (CONT.) • Domain based Features considered are: • HTML and JavaScript based Features considered are: • All together 17 features are extracted from the dataset. • DNS Record • Age of Domain • Website Traffic • End Period of Domain • Iframe Redirection • Disabling Right Click • Status Bar Customization • Website Forwarding
  • 9. MACHINE LEARNING MODELS • This is a supervised machine learning task. There are two major types of supervised machine learning problems, called classification and regression. • This data set comes under classification problem, as the input URL is classified as phishing (1) or legitimate (0). The machine learning models (classification) considered to train the dataset in this notebook are: • Decision Tree • Random Forest • Multilayer Perceptrons • XGBoost • Autoencoder Neural Network • Support Vector Machines
  • 10. MODEL EVALUATION • The models are evaluated, and the considered metric is accuracy. • Below Figure shows the training and test dataset accuracy by the respective models: • For the above it is clear that the XGBoost model gives better performance. The model is saved for further usage.
  • 11. NEXT STEPS • Working on this project is very knowledgeable and worth the effort. • Through this project, one can know a lot about the phishing websites and how they are differentiated from legitimate ones. • This project can be taken further by creating a browser extensions of developing a GUI. • These should classify the inputted URL to legitimate or phishing with the use of the saved model.