SlideShare a Scribd company logo
Use of HOG Descriptors in
Phishing Detection
Ahmet Selman Bozkir, Ebru Akcapinar Sezer
Hacettepe University Computer Engineering Department, TURKEY
ISDFS 2016
Topics
• What is phishing?
• Facts and the rise of phishing attacks
• Existing approaches
• Why vision based scheme?
• HOG descriptors
• Demonstration of developed method
• Experiments and Results
• Conclusion
What is phishing?
• Phishing is a scamming activity which deals with
making a visual illusion on computer users by
providing fake web pages which mimic their
legitimate targets in order to steal valuable digital
data such as credit card information or e-mail
passwords.
Phone phreaking + fishing -> «phishing»
Facts and figures
* Source: PhishLabs 2016 Phishing Trends & Intelligence Report
Facts and figures
• In 2012-2013, 37.3 millions
users were affected by
phishing attacks* 37.3M
* Source: 2013 Verizon Data Breach Investigation Report
Facts and figures
• 1 million confirmed
malicious phishing sites
on over 130,000 unique
domains. (as of 2013)
* Source: PhishLabs 2016 Phishing Trends & Intelligence Report
Facts and figures
Average life time of phishing
pages is 32 hours
• Risk of zero-day attacks
getting higher due to not
being discovered by
blacklists
32h
* Source: APWG, Phishing activity trends paper. [Online].
Available at http://www/antiphishing.org/resources/apwg-papers/
Facts and figures
Consumer-oriented phishing
attacks targeted
• financial institutions
• cloud storage/file hosting sites
• webmail and online services
• ecommerce sites
• payment services.
90%
* Source: PhishLabs 2016 Phishing Trends & Intelligence Report
Facts and figures
• financial institutions
• payment services.
* Source: PhishLabs 2016 Phishing Trends & Intelligence Report
• cloud storage/file hosting
sites
Existing Anti-Phishing Approaches
Content & Blacklist
CANTINA [1]
SpoofGuard[2]
NetCraft [3]
DOM based
Medvet et al.[4]
Zhang et al. [5]
Fu et al. [6]
Vision based
Maurer et al.[7]
Verilog [8]
Other
Chen et al.[9]
Why vision based scheme?
• Substition of textual HTML elements with <IMG> or applet like
contents
• Zero day attacks need pro-active solutions
• Dynamic / AJAX type content loading
• Different DOM organizations between legitimate and fake web
pages
• More robust to complex backgrounds or page layouts
• And the most important is vision based solutions are in
concordance with human perception
* Source: PhishLabs 2016 Phishing Trends & Intelligence Report
Methodology: HOG Features and
Descriptors
• Histogram of Oriented Gradients
• Dalal & Triggs-2005
• A good way to characterize and capture
local object appearance or shapes by
utilizing distribution of intensity
gradients or edge directions.
• Preffered because of:
(i) HOG descriptors are able to capture visual
cues of overall page layout;
(ii) they are able to provide a certain degree
of rotation and translation invariance.
Developed approach in details
𝑆𝑖𝑚 𝐻 𝑀 , 𝐻 𝑁 =
𝑖=1
𝑇
mi n( 𝐻 𝑀 (𝑖), 𝐻 𝑁 (𝑖)
Experiments
• For the first phishing web page dataset, 50 unique phishing
pages reported from Phishtank covering the days between 14
December 2015 and 5 January 2016 were collected.
• For the legitimate web page pairs, we have collected 18
legitimate home pages from Alexa top 500 web site directory.
Afterwards, we have shuffled the page URLs in order to
obtain 100 distinct legitimate home page pairs.
• 64 pixel wide and 128 pixel wide cells were employed
Results - 1
Statistics
Similarity of Pairs of Phishing Pages
(50 pages)
HOG-64 px cells HOG-128 px cells
min 51.873 % 49.910 %
max 98.861 % 98.390 %
mean 78.868 % 78.637 %
standard deviation 12.147 % 10.963 %
STATISTICS OF PHISHING AND THEIR TARGET PAGE
PAIRS IN HOG-64 AND HOG-128
Statistics
Similarity of Pairs of Legitimate Pages
(100 unique pairs)
HOG-64 px cells HOG-128 px cells
min 38.420 % 45.683 %
max 74.459 % 77.092 %
mean 60.739 % 66.012 %
standard deviation 11.026 % 9.492 %
STATISTICS OF UNIQUE LEGITIMATE PAGE PAIRS IN
HOG-64 AND HOG-128
Results - 2
Similarity scores of unique legitimate page pairs
Results - 3
Similarity scores of phishing pages and their legitimate targets
Discussion and Conclusion
• This work is the first study that employs HOG in phishing detection
• It performs a robust method for phishing detection as it is pure vision based and
able to capture local visual cues on web page surface.
• However we addressed some shortcomings.
• Image contents in phishing web pages are generally different than the legitimate
ones. So the image invariance must be supplied in order to achieve a better and
robust phishing detection.
• The method must be also verified with a more comprehensive dataset.
References
1. Y. Zhang, J. Hong, L. Cranor, CANTINA: A Content-Based Approach to Detecting Phishing Web Sites, WWW 2007
2. Chou, N., R. Ledesma, Y. Teraguchi, D. Boneh, and J.C. Mitchell. Client-Side Defense against Web-Based Identity Theft.
In Proceedings of The 11th Annual Network and Distributed System Security Symposium (NDSS '04).
3. Netcraft, Netcraft Anti-Phishing Toolbar. Visited: April 20, 2016. http://toolbar.netcraft.com/
4. E. Medvet, E. Kirda and C. Krueger, Visual-Similarity-Based Phishing Detection, Securecomm ’08 International
Conference on Security and Privacy in Communication Networks, 2008
5. W. Zhang, H. Lu, B. Xu and H. Yang, Web Phishing Detection Based on Page Spatial Layout Similarity, Informatica, vol.
37, pp. 231-244, 2013.
6. A.Y. Fu, L. Wenyin and X. Deng, Detecting Phishing Web Pages with Visual Similarity Assesment based Earth
Mover’s Distance (EMD), IEEE Transactions on Dependable and Secure Computing, pp. 301-311, 2006.
7. M.E. Maurer and D. Herzner, Using visual website similarity for phishing detection and reporting, In CHI’12
Extended Abstacts on Human Factors in Computing Systems, 2012.
8. G. Wang, H. Liu, S. Becerra, K. Wang, Verilog: Proactive Phishing Detection via Logo Recognition, Technical
Report CS2011-0669, UC San Diego, 2011.
9. T. Chen, S. Dick, J. Miller, Detecting Visually Similar Web Pages: Application to Phishing Detection, ACM
Transactions on Internet and Technology, 10(2), 2010

More Related Content

PDF
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...
PPTX
Web Mining Projects Topics
PDF
IRJET- Compound Keyword Search of Encrypted Cloud Data by using Semantic Scheme
PDF
Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...
PPTX
Introduction to Web Mining and Spatial Data Mining
PPTX
Data Mining: Application and trends in data mining
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...
Web Mining Projects Topics
IRJET- Compound Keyword Search of Encrypted Cloud Data by using Semantic Scheme
Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...
Introduction to Web Mining and Spatial Data Mining
Data Mining: Application and trends in data mining

What's hot (14)

PPTX
Data Mining: Graph mining and social network analysis
DOCX
M privacy for collaborative data publishing
PPTX
TakeDownCon Rocket City: “White Hat Anonymity”: Current challenges security r...
PPTX
Web Mining & Text Mining
PDF
International Journal of Engineering Research and Development (IJERD)
DOCX
K-NEAREST NEIGHBOR CLASSIFICATION OVER SEMANTICALLY SECURE ENCRYPTED RELATION...
PDF
Paper id 71201915
PPT
Privacy preserving multi-keyword ranked search over encrypted cloud data
PPTX
Data Mining: Text and web mining
PDF
K nearest neighbor classification over semantically secure encrypted relation...
PDF
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
PDF
Network Forensic Investigation of HTTPS Protocol
PPTX
Web Mining Project Ideas
DOC
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
Data Mining: Graph mining and social network analysis
M privacy for collaborative data publishing
TakeDownCon Rocket City: “White Hat Anonymity”: Current challenges security r...
Web Mining & Text Mining
International Journal of Engineering Research and Development (IJERD)
K-NEAREST NEIGHBOR CLASSIFICATION OVER SEMANTICALLY SECURE ENCRYPTED RELATION...
Paper id 71201915
Privacy preserving multi-keyword ranked search over encrypted cloud data
Data Mining: Text and web mining
K nearest neighbor classification over semantically secure encrypted relation...
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
Network Forensic Investigation of HTTPS Protocol
Web Mining Project Ideas
A secure and dynamic multi keyword ranked search scheme over encrypted cloud ...
Ad

Viewers also liked (9)

PPT
UML ile Modelleme
PPTX
DeğişİK Modül Notları
PDF
2017 Phishing Trends & Intelligence Report: Hacking the Human
DOC
Kocaeli Üniversitesi Randevu Yönetim Sistemi Raporu
PPTX
Yazılım Gereksinim Mühendisliği Semineri
DOC
Tasarım Analiz Raporu: Üniversite Web Sitesi
DOC
Okul otomasyon rapor
PDF
Hastane Poliklinik Otomasyonu
PPTX
Gereksinim Analizi Dokümanı Hazırlama
UML ile Modelleme
DeğişİK Modül Notları
2017 Phishing Trends & Intelligence Report: Hacking the Human
Kocaeli Üniversitesi Randevu Yönetim Sistemi Raporu
Yazılım Gereksinim Mühendisliği Semineri
Tasarım Analiz Raporu: Üniversite Web Sitesi
Okul otomasyon rapor
Hastane Poliklinik Otomasyonu
Gereksinim Analizi Dokümanı Hazırlama
Ad

Similar to Use of hog descriptors in phishing detection (20)

PPTX
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
PDF
Phishing Website Detection using Classification Algorithms
PDF
Mitigation of Cyber Threats through Identification of Phishing Websites
PDF
IRJET- Phishing Website Detection System
PDF
IRJET - Chrome Extension for Detecting Phishing Websites
PDF
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
PPTX
FINAL PROPOSAL PRESENTATION SLIDE.pptx
PDF
Detection of Phishing Websites using machine Learning Algorithm
PDF
Prevention of Phishing Attacks Based on Discriminative Key Point Features of ...
PDF
Iy2515891593
PDF
Iy2515891593
PDF
Deep learning in phishing mitigation: a uniform resource locator-based predic...
PPTX
Detection of Phishing Websites
PDF
[IJET V2I5P15] Authors: V.Preethi, G.Velmayil
PDF
HIGH ACCURACY PHISHING DETECTION
PDF
A Hybrid Approach For Phishing Website Detection Using Machine Learning.
PDF
Clustering Categorical Data for Internet Security Applications
PDF
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
PDF
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
PPTX
36.44.final
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
Phishing Website Detection using Classification Algorithms
Mitigation of Cyber Threats through Identification of Phishing Websites
IRJET- Phishing Website Detection System
IRJET - Chrome Extension for Detecting Phishing Websites
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
FINAL PROPOSAL PRESENTATION SLIDE.pptx
Detection of Phishing Websites using machine Learning Algorithm
Prevention of Phishing Attacks Based on Discriminative Key Point Features of ...
Iy2515891593
Iy2515891593
Deep learning in phishing mitigation: a uniform resource locator-based predic...
Detection of Phishing Websites
[IJET V2I5P15] Authors: V.Preethi, G.Velmayil
HIGH ACCURACY PHISHING DETECTION
A Hybrid Approach For Phishing Website Detection Using Machine Learning.
Clustering Categorical Data for Internet Security Applications
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
36.44.final

More from Selman Bozkır (12)

PPTX
lecture_07.pptx
PPT
23--Web-Design-Principles
PPTX
Kötücül Yazılımların Tanınmasında Evrişimsel Sinir Ağlarının Kullanımı ve Kar...
PPT
ADEM: An Online Decision Tree Based Menu Demand Prediction Tool for Food Courts
PPTX
Measurement and metrics in model driven software development
PPT
Hopfield Ağı
PPTX
Probabilistic information retrieval models & systems
PPTX
SHOE (simple html ontology extensions)
PPT
Predicting food demand in food courts by decision tree approaches
PPT
Identification of User Patterns in Social Networks by Data Mining Techniques:...
PPT
FUAT – A Fuzzy Clustering Analysis Tool
PPTX
Data mining & Decison Trees
lecture_07.pptx
23--Web-Design-Principles
Kötücül Yazılımların Tanınmasında Evrişimsel Sinir Ağlarının Kullanımı ve Kar...
ADEM: An Online Decision Tree Based Menu Demand Prediction Tool for Food Courts
Measurement and metrics in model driven software development
Hopfield Ağı
Probabilistic information retrieval models & systems
SHOE (simple html ontology extensions)
Predicting food demand in food courts by decision tree approaches
Identification of User Patterns in Social Networks by Data Mining Techniques:...
FUAT – A Fuzzy Clustering Analysis Tool
Data mining & Decison Trees

Recently uploaded (20)

PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
The scientific heritage No 166 (166) (2025)
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPT
Chemical bonding and molecular structure
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PDF
. Radiology Case Scenariosssssssssssssss
PDF
An interstellar mission to test astrophysical black holes
PPTX
2. Earth - The Living Planet earth and life
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
AlphaEarth Foundations and the Satellite Embedding dataset
Taita Taveta Laboratory Technician Workshop Presentation.pptx
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
INTRODUCTION TO EVS | Concept of sustainability
7. General Toxicologyfor clinical phrmacy.pptx
The scientific heritage No 166 (166) (2025)
Classification Systems_TAXONOMY_SCIENCE8.pptx
Comparative Structure of Integument in Vertebrates.pptx
Chemical bonding and molecular structure
Introduction to Fisheries Biotechnology_Lesson 1.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
2. Earth - The Living Planet Module 2ELS
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
bbec55_b34400a7914c42429908233dbd381773.pdf
. Radiology Case Scenariosssssssssssssss
An interstellar mission to test astrophysical black holes
2. Earth - The Living Planet earth and life
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
microscope-Lecturecjchchchchcuvuvhc.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...

Use of hog descriptors in phishing detection

  • 1. Use of HOG Descriptors in Phishing Detection Ahmet Selman Bozkir, Ebru Akcapinar Sezer Hacettepe University Computer Engineering Department, TURKEY ISDFS 2016
  • 2. Topics • What is phishing? • Facts and the rise of phishing attacks • Existing approaches • Why vision based scheme? • HOG descriptors • Demonstration of developed method • Experiments and Results • Conclusion
  • 3. What is phishing? • Phishing is a scamming activity which deals with making a visual illusion on computer users by providing fake web pages which mimic their legitimate targets in order to steal valuable digital data such as credit card information or e-mail passwords. Phone phreaking + fishing -> «phishing»
  • 4. Facts and figures * Source: PhishLabs 2016 Phishing Trends & Intelligence Report
  • 5. Facts and figures • In 2012-2013, 37.3 millions users were affected by phishing attacks* 37.3M * Source: 2013 Verizon Data Breach Investigation Report
  • 6. Facts and figures • 1 million confirmed malicious phishing sites on over 130,000 unique domains. (as of 2013) * Source: PhishLabs 2016 Phishing Trends & Intelligence Report
  • 7. Facts and figures Average life time of phishing pages is 32 hours • Risk of zero-day attacks getting higher due to not being discovered by blacklists 32h * Source: APWG, Phishing activity trends paper. [Online]. Available at http://www/antiphishing.org/resources/apwg-papers/
  • 8. Facts and figures Consumer-oriented phishing attacks targeted • financial institutions • cloud storage/file hosting sites • webmail and online services • ecommerce sites • payment services. 90% * Source: PhishLabs 2016 Phishing Trends & Intelligence Report
  • 9. Facts and figures • financial institutions • payment services. * Source: PhishLabs 2016 Phishing Trends & Intelligence Report • cloud storage/file hosting sites
  • 10. Existing Anti-Phishing Approaches Content & Blacklist CANTINA [1] SpoofGuard[2] NetCraft [3] DOM based Medvet et al.[4] Zhang et al. [5] Fu et al. [6] Vision based Maurer et al.[7] Verilog [8] Other Chen et al.[9]
  • 11. Why vision based scheme? • Substition of textual HTML elements with <IMG> or applet like contents • Zero day attacks need pro-active solutions • Dynamic / AJAX type content loading • Different DOM organizations between legitimate and fake web pages • More robust to complex backgrounds or page layouts • And the most important is vision based solutions are in concordance with human perception * Source: PhishLabs 2016 Phishing Trends & Intelligence Report
  • 12. Methodology: HOG Features and Descriptors • Histogram of Oriented Gradients • Dalal & Triggs-2005 • A good way to characterize and capture local object appearance or shapes by utilizing distribution of intensity gradients or edge directions. • Preffered because of: (i) HOG descriptors are able to capture visual cues of overall page layout; (ii) they are able to provide a certain degree of rotation and translation invariance.
  • 13. Developed approach in details 𝑆𝑖𝑚 𝐻 𝑀 , 𝐻 𝑁 = 𝑖=1 𝑇 mi n( 𝐻 𝑀 (𝑖), 𝐻 𝑁 (𝑖)
  • 14. Experiments • For the first phishing web page dataset, 50 unique phishing pages reported from Phishtank covering the days between 14 December 2015 and 5 January 2016 were collected. • For the legitimate web page pairs, we have collected 18 legitimate home pages from Alexa top 500 web site directory. Afterwards, we have shuffled the page URLs in order to obtain 100 distinct legitimate home page pairs. • 64 pixel wide and 128 pixel wide cells were employed
  • 15. Results - 1 Statistics Similarity of Pairs of Phishing Pages (50 pages) HOG-64 px cells HOG-128 px cells min 51.873 % 49.910 % max 98.861 % 98.390 % mean 78.868 % 78.637 % standard deviation 12.147 % 10.963 % STATISTICS OF PHISHING AND THEIR TARGET PAGE PAIRS IN HOG-64 AND HOG-128 Statistics Similarity of Pairs of Legitimate Pages (100 unique pairs) HOG-64 px cells HOG-128 px cells min 38.420 % 45.683 % max 74.459 % 77.092 % mean 60.739 % 66.012 % standard deviation 11.026 % 9.492 % STATISTICS OF UNIQUE LEGITIMATE PAGE PAIRS IN HOG-64 AND HOG-128
  • 16. Results - 2 Similarity scores of unique legitimate page pairs
  • 17. Results - 3 Similarity scores of phishing pages and their legitimate targets
  • 18. Discussion and Conclusion • This work is the first study that employs HOG in phishing detection • It performs a robust method for phishing detection as it is pure vision based and able to capture local visual cues on web page surface. • However we addressed some shortcomings. • Image contents in phishing web pages are generally different than the legitimate ones. So the image invariance must be supplied in order to achieve a better and robust phishing detection. • The method must be also verified with a more comprehensive dataset.
  • 19. References 1. Y. Zhang, J. Hong, L. Cranor, CANTINA: A Content-Based Approach to Detecting Phishing Web Sites, WWW 2007 2. Chou, N., R. Ledesma, Y. Teraguchi, D. Boneh, and J.C. Mitchell. Client-Side Defense against Web-Based Identity Theft. In Proceedings of The 11th Annual Network and Distributed System Security Symposium (NDSS '04). 3. Netcraft, Netcraft Anti-Phishing Toolbar. Visited: April 20, 2016. http://toolbar.netcraft.com/ 4. E. Medvet, E. Kirda and C. Krueger, Visual-Similarity-Based Phishing Detection, Securecomm ’08 International Conference on Security and Privacy in Communication Networks, 2008 5. W. Zhang, H. Lu, B. Xu and H. Yang, Web Phishing Detection Based on Page Spatial Layout Similarity, Informatica, vol. 37, pp. 231-244, 2013. 6. A.Y. Fu, L. Wenyin and X. Deng, Detecting Phishing Web Pages with Visual Similarity Assesment based Earth Mover’s Distance (EMD), IEEE Transactions on Dependable and Secure Computing, pp. 301-311, 2006. 7. M.E. Maurer and D. Herzner, Using visual website similarity for phishing detection and reporting, In CHI’12 Extended Abstacts on Human Factors in Computing Systems, 2012. 8. G. Wang, H. Liu, S. Becerra, K. Wang, Verilog: Proactive Phishing Detection via Logo Recognition, Technical Report CS2011-0669, UC San Diego, 2011. 9. T. Chen, S. Dick, J. Miller, Detecting Visually Similar Web Pages: Application to Phishing Detection, ACM Transactions on Internet and Technology, 10(2), 2010