SlideShare a Scribd company logo
Scientific Journal Impact Factor (SJIF): 1.711
International Journal of Modern Trends in Engineering
and Research
www.ijmter.com
@IJMTER-2014, All rights Reserved 404
e-ISSN: 2349-9745
p-ISSN: 2393-8161
Analyzing the effectualness of Phishing Algorithms in Web
Applications
Inquest of LinkGuard and CANTINA Approach
Mr. Vinod Singh Kharsan1
, Bernadine Rekha Soman2
, Sonali Sharma3
Assistant Professor1
, UG Scholar2,3
Department of Computer Science, Chouksey Engineering College, Bilaspur (C.G.)1,2,3
Abstract— The initial and proficient loss of deception is belief. A wolf in sheep’s clothing is tough
to recognize, similar is the schema of a phishing website. Phishing is the emulsion of social
engineering and technical exploits designed to persuade a victim to provide personal information, for
the fiscal gain of the attacker. It is a new kind of network assault where the attacker creates a spitting
image of an already existing Web Page to delude users. In this paper, we will study two anti-phishing
algorithms, one an end-host based algorithm known as the LinkGuard Algorithm, while the other a
content based approach known as the CANTINA.
Keywords- Phishing, toolbars, Subroutine, LinkGuard, CANTINA, heuristics, visual.
I. INTRODUCTION
Progression of internet has played a eloquent role in trade and custom activities. Unfortunately,
indigent security on the internet and large monetary gains furnish a strong motivation for attackers to
commit such seemingly low risk, yet high- return online scams. Web –based casualties are usually
caused by Phishing attacks, malware, spoofing and other threats, costs organizations millions of
dollars every year. Web fraud also strikes user’s insights of e-businesses. It is estimated that between
May 2004 and 2005, approx. 1.2 million computer users in U.S suffered losses amounting up to
US$929 million. As a matter of fact United States businesses lose an estimated US$2 billion per year
as their clients become victims.[1]
Recently, there has been a dramatic increment in phishing, a relative attack in which victims are
deluded by spoofed emails and fraudulent web sites into giving up patented information. Phishing is a
quick growing challemge, with 9,255 unique phishing sites reported in June 2006 alone[2].
It is not
familiar exactly how much phishing costs each year since impacted industries are reluctant to release
figures; estimates range from $1 billion[3]
to 2.8 billion[4]
per year. To respond to this peril, software
brokers and organizations have released a variety of anti phishing toolbars. As of September 2006, the
free software download site download.com, listed 84 anti –phishing toolbars. However a strong need
for superior automated detection algorithms was required. Phishers employ e-mail as their major
method to carry out phishing attacks. However, this algorithm can be applied to attacks that use other
means such as instant messaging. In general phishing attacks are performed with the following steps:
1) Phishers set up a forged web site which looks exactly like the legal Web site. Including setting
up a server, applying the dns server name, and creating the web pages similar to the destination
Website etc.
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 05, [November - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 405
2) Then they send large amounts of mimic emails to target users under the name of those legitimate
companies and troupes, in order to convince the potential victims to visit their Web sites.
3) Users then click on the hyperlink given, and input the required information.
4) Next the phishers steal the information and prosecute their fraud such as transferring money
from the victims account.
Despite efforts such as blocking phishing Web sites, enhancing their security, using spam filters etc. it
is impossible to avoid phishing. As a last defense, users can install anti-phishing tools in their
computers. Anti-phishing tools used these days can be divided into three categories: blacklist/whitelist
based, rule based and content based.
 Category 1: When a user sojourns a Web site, the anti phishing tool searches the address of that
sites in a blacklist stored in the database. If the visited site is on the list, then the anti-phishing tool
then warns the users. Tools such as the Scam Blocker from the Earthlink Company, PhishGuard
and Net craft, etc. Nevertheless newly emerged Web sites cannot be prevented with this.
 Category 2: This category of tools uses a certain set of rules for their software. Softwares such as
the SpoofGuard and TrustWatch use this method, i.e. it checks the domain name, URL of the Web
site and also checks whether the browser is directed to the current URL via the links in the
contents of emails. If it finds that the domain name of the visited Website is similar to the well-
known website or domain name, then it warns the users.
 Category 3: This category of tools is a recent discovery. A content based approach for detecting
phishing web sites known as CANTINA. It examines the content of the webpage such as the
common characteristics like the URL and the domain name and determines whether the site is
legitimate or not.
II. ANTI-PHISHING TOOLS AND ALGORITHMS
Here we interrogate the adopted methodologies, working principles and concluding results of both the
algorithms and try to infer which out of the two is better. Shifting onto the methodologies of the
algorithms we begin with the procedure adopted in the LinkGuard Algorithm and then we will move
on to CANTINA.
2.1 LINKGUARD ALGORITHM
The LinkGuard algorithm is one such algorithm which defines the characteristics of URL and
Domain names. The elementary work of this algorithm is to dissect the actual links and the visual
links. Some of the important systems contained in the architecture of this algorithm can be given as
follows:
Communication: All the input course is gathered and facts are sent to the analyzer.
Database: User’s inputs such as the domain name, Url, Blacklists etc. are stored in here.
Analyzer: Data which is provided by the communication and the database is sent to the analyzer for
modification purpose. Hence the analyzer can also be called as the Link Guard of the system.
Alerter: Every time something is received by the analyzer which has to be displayed to the user the
alerter generates a warning message.
Logger: Information related to this particular session is stored.
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 05, [November - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 406
Fig.1. Architecture of LinkGuard Algorithm
Now we shall discuss procedure of the LinkGuard algorithm:
2.1.1 Class of the hyperlinks in the phishing e-mails:
To gather useful information from the user’s phishers persuade the victims to click in the hyperlink
which is embedded in the counterfeited site. A hyperlink has the following structure:
<aherf=”http://www.antiphishing.org/phishingarchive.html”>Phishing Archive</a>
Where ‘http://www.antiphishing.org/phishingarchive.html’ is the “URI(universal resource
identifiers)” which provides information to the user about the networked resource, and ‘Phishing
Archive’ is the “Anchor text” which is the text displayed on the user’s web browser.
2.1.2 The LinkGuard Algorithm:
Linkguard algorithm effectuates by analyizing the differences between the actual and the hyperlink. It
also computes the resemblances of the URI with a legitimate site. The common terms used in the
algorithms are:
v_link: visual_link;
a_link: actual_link;
v_dns: visual DNS name;
a_dns: actual DNS name;
sender_dns: sender’s DNS name;
int LinkGuard (v_link, a_link} {
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 05, [November - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 407
1 v_dns = GetDNSName (v_link);
2 a_dns = GetDNSName (a_link);
3 if ((v_dns and a_dns are not
4 empty) and (v_dns! = a_dns))
5 return PHISHING;
6 if (a_dns is dotted decimal)
7 return POSSIBLE_PHISHING;
8 if (a_link or v_link is encoded)
9 {
10 v_link2 = decode (v_link);
11 a_link2 = decode (a_link);
12 return LinkGuard (v_link2, a_link2);
13}
14 /* analyze the domain name for
15 possible phishing */
16 if (v_dns is NULL)
17 return AnalyzeDNS (a_link);
}
int AnalyzeDNS (actual_link) {
/* Anatomize the actual DNS name according to the blacklist and whitelist*/
18 if (actual_dns in blacklist)
19 return PHISHING;
20 if (actual_dns in whitelist)
21 return NOTPHISHING;
22 return PatternMatching(actual_link);
}
int PatternMatching (actual_link){
23 if (sender_dns and actual_dns are different)
24 return POSSIBLE_PHISHING;
25 for (each item prev_dns in seed_set)
26 {
27 bv = Similarity(prev_dns, actual_link);
28 if (bv == true)
29 return POSSIBLE_PHISHING;
30 }
31 return NO_PHISHING;
}
float Similarity (str, actual_link) {
32 if (str is part of actual_link)
33 return true;
34 int maxlen = the maximum string
35 lengths of str and actual_dns;
36 int minchange = the minimum number of
37 changes needed to transform str
38 to actual_dns (or vice verse);
39 if (thresh<(maxlen-minchange)/maxlen<1)
40 return true
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 05, [November - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 408
41 return false;
} [5]
The Linkguard algorithm contains a main routine LinkGuard in which it extracts the DNS names from
the actual and visual links. When a comparision of th names is links is made and it is found that these
names are not same, or dotted decimal IP address is directly used in actual dns then phishing attack is
recognized. On the other hand when there is no destination information(DNS or dotted IP address) in
the visual link the subroutine AnalyzeDNS is called to handle phishing attack. While the pattern
matching subroutine is used to identify other unkown attacks. In this method either the sender’s email
address is extracted from the e-mail or a collection of all the DNS names is done in a “seed set”. After
this the PatternMatching subroutine checks if the actual DNS name of a hyperlink is different from
the DNS name in the sender’s address.
Fig. 2. Flow Chart of the LinkGuard Algorithm
Hence, in the above figure we have seen the workings and procedures involved in LinkGuard
algorithm. Now we shall study the CANTINA algorithm.
2.2 CANTINA
The CANTINA is a unprecedented content-based passage of detecting phishing websites. What makes
the cantina intriguing is that other methods look at the shell peculiarity of the web pages whereas the
CANTINA studies the substance or content of that webpage. Not only this, the approach also analyzes
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 05, [November - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 409
the text-based content of the page itself. This approach works on the principle of a heuristic based
algorithm known as the TF-IDF algorithm.
Working of the TF-IDF:
The TF-IDF algorithm is used for the purpose of information retrieval and text-mining. It
succumbs importance to the occurrence of a word in a document. As the number of word appears in
the document the importance increases.
The term ‘TF’ stands for “term frequency” meaning the number of times a given term appears in a
specific document. It usually helps in preventing a bias towards the longer documents and also
measures the importance of that term within a particular document. On the other hand the term ‘IDF’
stands for “inverse document frequency” and is meant to define the general importance of the term i.e.
its measures how common a particular word is in the entire document. Accordingly we can say that
any term which has high TF-IDF weight means that it has high TF and low document frequency.
Alongside the TF-IDF algorithm Robust Hyperlinks are also required in the CANTINA
approach. Phelps and Wilensky evolved the idea of Robust Hyperlink to overcome the problem of
broken links[6].
They proposed on adding a small number of some chosen terms which they called as
“lexical signature” to the URL’s. Now here comes the role of the TF-IDF i.e. to generate a lexical
signature.
In order to generate the signatures the TF-IDF value of each word of a document is calculated
and words which have highest values are then selected. The term frequency defines wholesomeness
and IDF provides rarity in documents.
So the work of the CANTINA can be supervened as:
 Calculate the TF-IDF scores of each term on a given web page.
 Clasp any five terms having the highest TF-IDF weights and generate a lexical signature.
 Provide this signature to any search engine of your choice.
 If the DNS name of the current webpage matches with the N top search results then we consider
that there is no phishing or else it is a phish website.
Fig. 3. Fake website which is uploaded to the internet.
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 05, [November - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 410
Fig. 4. The search engine showing results whether the site is legitimate with the help of the lexical signature.
III. COMPARING BENEFITS AND DRAWBACKS
Commencing with the Pros and cons of both the approaches discussed above would be given as-
Benefits of the Linkguard Algorithm can be given as:
 One great asset is that the LinkGuard algorithm recognizes known as well as unknown phishing
sites.
 It is light-weighted and based on derived characteristics.
Drawbacks of the LinkGuard algorithm can be given as:
 Due its false negative differentiating nature the LinkGuard algorithm sometimes treats some
phishing sites as normal websites( specifically those sites which have not been visted by the user
at all).
 When the comparability index of the site is near close for example say 3/4 it often generates a
false positive.
Now coming to the benefits of the CANTINA APPROACH:
 Fast in nature and is based on the content-based characteristics.
 No maintenance required by the user as most functionalities are performed by the search engines.
Drawbacks of the CANTINA approach can be given as:
 Language can be a drawback here as phishing occurs in all kinds of websites irrespective of their
languages.
 In order to overcome some of the quering problems of the search engines the algorithms will
have to be implemented as server based email filter rather than a browser tool.
International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 05, [November - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 411
IV. CONCLUSION
The frogery which has occurred due to phishing has taken a rigorous plunge in the recent years. In this
paper, we have analogized the working principles of two approaches which are used for the detection
of phishing Websites namely the LinkGuard and CANTINA. During this we implicited that the
LinkGuard is an algorithm which differentiates on the basis of URL’s and DNS and is effective upto
96% detection including the unknown websites. On the other hand the CANTINA approach is a novel
designed method involving heuristics and content based methods thereby increasing 97% detection of
phishing with 6% false positives. Therefore, in the future work both these approaches can act as great
barriers for keeping away all kinds of phishing such that user’s might not become victims to forgery
or theft.
REFERENCES
[1] H.Kerstein, Paul (July 19,2005)
[2] Anti-Phishing Working Group, Phishing Activity Trends Report. 2006. http://www.antiphishing.org/reports/.
[3] Keizer, G., Phishing Costs Neraly $1 Billion, TechWeb TechnologyNews.
http://www.techweb.com/wire/security/164902671.
[4] McMillian, R.,Gartner: Consumers to lose $2.8 billion to phishers in 2006, Networkworld,2006.
[5] U.Vidyasagar “Intelligent phishing website detection and prevention system by using LinkGuard algorithm”
[6] Phelps, T.A. and R. Wilensky, Robust Hyperlinks and Locations, D-Lib Magazine, vol. 6(7/8),2000.
http://www.dlib.org/dlib/july00/wilensky/07wilensky.html.
Analyzing the effectualness of Phishing Algorithms in Web Applications Inquest of LinkGuard and CANTINA Approach
Analyzing the effectualness of Phishing Algorithms in Web Applications Inquest of LinkGuard and CANTINA Approach

More Related Content

PDF
Intelligent Phishing Website Detection and Prevention System by Using Link Gu...
PDF
IRJET - Chrome Extension for Detecting Phishing Websites
PDF
Iy2515891593
PDF
[IJET V2I5P15] Authors: V.Preethi, G.Velmayil
PDF
IRJET- Detecting the Phishing Websites using Enhance Secure Algorithm
PDF
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
PDF
Learning to detect phishing ur ls
PDF
PDMLP: PHISHING DETECTION USING MULTILAYER PERCEPTRON
Intelligent Phishing Website Detection and Prevention System by Using Link Gu...
IRJET - Chrome Extension for Detecting Phishing Websites
Iy2515891593
[IJET V2I5P15] Authors: V.Preethi, G.Velmayil
IRJET- Detecting the Phishing Websites using Enhance Secure Algorithm
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
Learning to detect phishing ur ls
PDMLP: PHISHING DETECTION USING MULTILAYER PERCEPTRON

What's hot (19)

PDF
IJSRED-V2I4P0
PDF
Knowledge base compound approach against phishing attacks using some parsing ...
PDF
KNOWLEDGE BASE COMPOUND APPROACH AGAINST PHISHING ATTACKS USING SOME PARSING ...
PDF
A survey on detection of website phishing using mcac technique
PDF
IRJET- Phishing Website Detection System
PDF
IRJET - Phishing Attack Detection and Prevention using Linkguard Algorithm
PDF
Detecting Phishing using Machine Learning
PDF
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
PDF
Detecting phishing websites using associative classification (2)
PDF
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
PDF
IRJET- Phishing Website Detection based on Machine Learning
PPT
Phishing detection & protection scheme
PDF
IRJET- Identification of Clone Attacks in Social Networking Sites
PPTX
Detection of phishing websites
PPTX
Phishing
PDF
Detecting malicious URLs using binary classification through ada boost algori...
PPTX
PHISHING DETECTION
PDF
Study on Phishing Attacks and Antiphishing Tools
PDF
Paper id 71201915
IJSRED-V2I4P0
Knowledge base compound approach against phishing attacks using some parsing ...
KNOWLEDGE BASE COMPOUND APPROACH AGAINST PHISHING ATTACKS USING SOME PARSING ...
A survey on detection of website phishing using mcac technique
IRJET- Phishing Website Detection System
IRJET - Phishing Attack Detection and Prevention using Linkguard Algorithm
Detecting Phishing using Machine Learning
MALICIOUS URL DETECTION USING CONVOLUTIONAL NEURAL NETWORK
Detecting phishing websites using associative classification (2)
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET- Phishing Website Detection based on Machine Learning
Phishing detection & protection scheme
IRJET- Identification of Clone Attacks in Social Networking Sites
Detection of phishing websites
Phishing
Detecting malicious URLs using binary classification through ada boost algori...
PHISHING DETECTION
Study on Phishing Attacks and Antiphishing Tools
Paper id 71201915
Ad

Similar to Analyzing the effectualness of Phishing Algorithms in Web Applications Inquest of LinkGuard and CANTINA Approach (20)

PDF
IRJET - PHISCAN : Phishing Detector Plugin using Machine Learning
PDF
IRJET-Content based approach for Detection of Phishing Sites
PDF
PHISHING URL DETECTION USING MACHINE LEARNING
PDF
A Review on Antiphishing Framework
PDF
IRJET- Survey on Phishing Attack Detection and Mitigation
PDF
Prevention of Phishing Attacks Based on Discriminative Key Point Features of ...
PDF
Clustering Categorical Data for Internet Security Applications
PPTX
website phishing by NR
PPT
Intro phishing
PDF
Phishing Website Detection Using Machine Learning
PPSX
Shiv seminar final
PDF
Web phish detection (an evolutionary approach)
PDF
Web phish detection (an evolutionary approach)
PDF
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
PPTX
Phishing Attack : A big Threat
PDF
Review of the machine learning methods in the classification of phishing attack
PDF
Anti phishing web browser
PDF
Detection of Phishing Websites using machine Learning Algorithm
PPTX
Classification with R
PDF
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
IRJET - PHISCAN : Phishing Detector Plugin using Machine Learning
IRJET-Content based approach for Detection of Phishing Sites
PHISHING URL DETECTION USING MACHINE LEARNING
A Review on Antiphishing Framework
IRJET- Survey on Phishing Attack Detection and Mitigation
Prevention of Phishing Attacks Based on Discriminative Key Point Features of ...
Clustering Categorical Data for Internet Security Applications
website phishing by NR
Intro phishing
Phishing Website Detection Using Machine Learning
Shiv seminar final
Web phish detection (an evolutionary approach)
Web phish detection (an evolutionary approach)
A LITERATURE REVIEW ON PHISHING EMAIL DETECTION USING DATA MINING
Phishing Attack : A big Threat
Review of the machine learning methods in the classification of phishing attack
Anti phishing web browser
Detection of Phishing Websites using machine Learning Algorithm
Classification with R
A Deep Learning Technique for Web Phishing Detection Combined URL Features an...
Ad

More from Editor IJMTER (20)

PDF
A NEW DATA ENCODER AND DECODER SCHEME FOR NETWORK ON CHIP
PDF
A RESEARCH - DEVELOP AN EFFICIENT ALGORITHM TO RECOGNIZE, SEPARATE AND COUNT ...
PDF
Analysis of VoIP Traffic in WiMAX Environment
PDF
A Hybrid Cloud Approach for Secure Authorized De-Duplication
PDF
Aging protocols that could incapacitate the Internet
PDF
A Cloud Computing design with Wireless Sensor Networks For Agricultural Appli...
PDF
A CAR POOLING MODEL WITH CMGV AND CMGNV STOCHASTIC VEHICLE TRAVEL TIMES
PDF
Sustainable Construction With Foam Concrete As A Green Green Building Material
PDF
USE OF ICT IN EDUCATION ONLINE COMPUTER BASED TEST
PDF
Textual Data Partitioning with Relationship and Discriminative Analysis
PDF
Testing of Matrices Multiplication Methods on Different Processors
PDF
Survey on Malware Detection Techniques
PDF
SURVEY OF TRUST BASED BLUETOOTH AUTHENTICATION FOR MOBILE DEVICE
PDF
SURVEY OF GLAUCOMA DETECTION METHODS
PDF
Survey: Multipath routing for Wireless Sensor Network
PDF
Step up DC-DC Impedance source network based PMDC Motor Drive
PDF
SPIRITUAL PERSPECTIVE OF AUROBINDO GHOSH’S PHILOSOPHY IN TODAY’S EDUCATION
PDF
Software Quality Analysis Using Mutation Testing Scheme
PDF
Software Defect Prediction Using Local and Global Analysis
PDF
Software Cost Estimation Using Clustering and Ranking Scheme
A NEW DATA ENCODER AND DECODER SCHEME FOR NETWORK ON CHIP
A RESEARCH - DEVELOP AN EFFICIENT ALGORITHM TO RECOGNIZE, SEPARATE AND COUNT ...
Analysis of VoIP Traffic in WiMAX Environment
A Hybrid Cloud Approach for Secure Authorized De-Duplication
Aging protocols that could incapacitate the Internet
A Cloud Computing design with Wireless Sensor Networks For Agricultural Appli...
A CAR POOLING MODEL WITH CMGV AND CMGNV STOCHASTIC VEHICLE TRAVEL TIMES
Sustainable Construction With Foam Concrete As A Green Green Building Material
USE OF ICT IN EDUCATION ONLINE COMPUTER BASED TEST
Textual Data Partitioning with Relationship and Discriminative Analysis
Testing of Matrices Multiplication Methods on Different Processors
Survey on Malware Detection Techniques
SURVEY OF TRUST BASED BLUETOOTH AUTHENTICATION FOR MOBILE DEVICE
SURVEY OF GLAUCOMA DETECTION METHODS
Survey: Multipath routing for Wireless Sensor Network
Step up DC-DC Impedance source network based PMDC Motor Drive
SPIRITUAL PERSPECTIVE OF AUROBINDO GHOSH’S PHILOSOPHY IN TODAY’S EDUCATION
Software Quality Analysis Using Mutation Testing Scheme
Software Defect Prediction Using Local and Global Analysis
Software Cost Estimation Using Clustering and Ranking Scheme

Recently uploaded (20)

PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Digital Logic Computer Design lecture notes
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
web development for engineering and engineering
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Well-logging-methods_new................
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Geodesy 1.pptx...............................................
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Foundation to blockchain - A guide to Blockchain Tech
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Internet of Things (IOT) - A guide to understanding
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Digital Logic Computer Design lecture notes
UNIT 4 Total Quality Management .pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
web development for engineering and engineering
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Well-logging-methods_new................
CYBER-CRIMES AND SECURITY A guide to understanding
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Geodesy 1.pptx...............................................
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd

Analyzing the effectualness of Phishing Algorithms in Web Applications Inquest of LinkGuard and CANTINA Approach

  • 1. Scientific Journal Impact Factor (SJIF): 1.711 International Journal of Modern Trends in Engineering and Research www.ijmter.com @IJMTER-2014, All rights Reserved 404 e-ISSN: 2349-9745 p-ISSN: 2393-8161 Analyzing the effectualness of Phishing Algorithms in Web Applications Inquest of LinkGuard and CANTINA Approach Mr. Vinod Singh Kharsan1 , Bernadine Rekha Soman2 , Sonali Sharma3 Assistant Professor1 , UG Scholar2,3 Department of Computer Science, Chouksey Engineering College, Bilaspur (C.G.)1,2,3 Abstract— The initial and proficient loss of deception is belief. A wolf in sheep’s clothing is tough to recognize, similar is the schema of a phishing website. Phishing is the emulsion of social engineering and technical exploits designed to persuade a victim to provide personal information, for the fiscal gain of the attacker. It is a new kind of network assault where the attacker creates a spitting image of an already existing Web Page to delude users. In this paper, we will study two anti-phishing algorithms, one an end-host based algorithm known as the LinkGuard Algorithm, while the other a content based approach known as the CANTINA. Keywords- Phishing, toolbars, Subroutine, LinkGuard, CANTINA, heuristics, visual. I. INTRODUCTION Progression of internet has played a eloquent role in trade and custom activities. Unfortunately, indigent security on the internet and large monetary gains furnish a strong motivation for attackers to commit such seemingly low risk, yet high- return online scams. Web –based casualties are usually caused by Phishing attacks, malware, spoofing and other threats, costs organizations millions of dollars every year. Web fraud also strikes user’s insights of e-businesses. It is estimated that between May 2004 and 2005, approx. 1.2 million computer users in U.S suffered losses amounting up to US$929 million. As a matter of fact United States businesses lose an estimated US$2 billion per year as their clients become victims.[1] Recently, there has been a dramatic increment in phishing, a relative attack in which victims are deluded by spoofed emails and fraudulent web sites into giving up patented information. Phishing is a quick growing challemge, with 9,255 unique phishing sites reported in June 2006 alone[2]. It is not familiar exactly how much phishing costs each year since impacted industries are reluctant to release figures; estimates range from $1 billion[3] to 2.8 billion[4] per year. To respond to this peril, software brokers and organizations have released a variety of anti phishing toolbars. As of September 2006, the free software download site download.com, listed 84 anti –phishing toolbars. However a strong need for superior automated detection algorithms was required. Phishers employ e-mail as their major method to carry out phishing attacks. However, this algorithm can be applied to attacks that use other means such as instant messaging. In general phishing attacks are performed with the following steps: 1) Phishers set up a forged web site which looks exactly like the legal Web site. Including setting up a server, applying the dns server name, and creating the web pages similar to the destination Website etc.
  • 2. International Journal of Modern Trends in Engineering and Research (IJMTER) Volume 01, Issue 05, [November - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 405 2) Then they send large amounts of mimic emails to target users under the name of those legitimate companies and troupes, in order to convince the potential victims to visit their Web sites. 3) Users then click on the hyperlink given, and input the required information. 4) Next the phishers steal the information and prosecute their fraud such as transferring money from the victims account. Despite efforts such as blocking phishing Web sites, enhancing their security, using spam filters etc. it is impossible to avoid phishing. As a last defense, users can install anti-phishing tools in their computers. Anti-phishing tools used these days can be divided into three categories: blacklist/whitelist based, rule based and content based.  Category 1: When a user sojourns a Web site, the anti phishing tool searches the address of that sites in a blacklist stored in the database. If the visited site is on the list, then the anti-phishing tool then warns the users. Tools such as the Scam Blocker from the Earthlink Company, PhishGuard and Net craft, etc. Nevertheless newly emerged Web sites cannot be prevented with this.  Category 2: This category of tools uses a certain set of rules for their software. Softwares such as the SpoofGuard and TrustWatch use this method, i.e. it checks the domain name, URL of the Web site and also checks whether the browser is directed to the current URL via the links in the contents of emails. If it finds that the domain name of the visited Website is similar to the well- known website or domain name, then it warns the users.  Category 3: This category of tools is a recent discovery. A content based approach for detecting phishing web sites known as CANTINA. It examines the content of the webpage such as the common characteristics like the URL and the domain name and determines whether the site is legitimate or not. II. ANTI-PHISHING TOOLS AND ALGORITHMS Here we interrogate the adopted methodologies, working principles and concluding results of both the algorithms and try to infer which out of the two is better. Shifting onto the methodologies of the algorithms we begin with the procedure adopted in the LinkGuard Algorithm and then we will move on to CANTINA. 2.1 LINKGUARD ALGORITHM The LinkGuard algorithm is one such algorithm which defines the characteristics of URL and Domain names. The elementary work of this algorithm is to dissect the actual links and the visual links. Some of the important systems contained in the architecture of this algorithm can be given as follows: Communication: All the input course is gathered and facts are sent to the analyzer. Database: User’s inputs such as the domain name, Url, Blacklists etc. are stored in here. Analyzer: Data which is provided by the communication and the database is sent to the analyzer for modification purpose. Hence the analyzer can also be called as the Link Guard of the system. Alerter: Every time something is received by the analyzer which has to be displayed to the user the alerter generates a warning message. Logger: Information related to this particular session is stored.
  • 3. International Journal of Modern Trends in Engineering and Research (IJMTER) Volume 01, Issue 05, [November - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 406 Fig.1. Architecture of LinkGuard Algorithm Now we shall discuss procedure of the LinkGuard algorithm: 2.1.1 Class of the hyperlinks in the phishing e-mails: To gather useful information from the user’s phishers persuade the victims to click in the hyperlink which is embedded in the counterfeited site. A hyperlink has the following structure: <aherf=”http://www.antiphishing.org/phishingarchive.html”>Phishing Archive</a> Where ‘http://www.antiphishing.org/phishingarchive.html’ is the “URI(universal resource identifiers)” which provides information to the user about the networked resource, and ‘Phishing Archive’ is the “Anchor text” which is the text displayed on the user’s web browser. 2.1.2 The LinkGuard Algorithm: Linkguard algorithm effectuates by analyizing the differences between the actual and the hyperlink. It also computes the resemblances of the URI with a legitimate site. The common terms used in the algorithms are: v_link: visual_link; a_link: actual_link; v_dns: visual DNS name; a_dns: actual DNS name; sender_dns: sender’s DNS name; int LinkGuard (v_link, a_link} {
  • 4. International Journal of Modern Trends in Engineering and Research (IJMTER) Volume 01, Issue 05, [November - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 407 1 v_dns = GetDNSName (v_link); 2 a_dns = GetDNSName (a_link); 3 if ((v_dns and a_dns are not 4 empty) and (v_dns! = a_dns)) 5 return PHISHING; 6 if (a_dns is dotted decimal) 7 return POSSIBLE_PHISHING; 8 if (a_link or v_link is encoded) 9 { 10 v_link2 = decode (v_link); 11 a_link2 = decode (a_link); 12 return LinkGuard (v_link2, a_link2); 13} 14 /* analyze the domain name for 15 possible phishing */ 16 if (v_dns is NULL) 17 return AnalyzeDNS (a_link); } int AnalyzeDNS (actual_link) { /* Anatomize the actual DNS name according to the blacklist and whitelist*/ 18 if (actual_dns in blacklist) 19 return PHISHING; 20 if (actual_dns in whitelist) 21 return NOTPHISHING; 22 return PatternMatching(actual_link); } int PatternMatching (actual_link){ 23 if (sender_dns and actual_dns are different) 24 return POSSIBLE_PHISHING; 25 for (each item prev_dns in seed_set) 26 { 27 bv = Similarity(prev_dns, actual_link); 28 if (bv == true) 29 return POSSIBLE_PHISHING; 30 } 31 return NO_PHISHING; } float Similarity (str, actual_link) { 32 if (str is part of actual_link) 33 return true; 34 int maxlen = the maximum string 35 lengths of str and actual_dns; 36 int minchange = the minimum number of 37 changes needed to transform str 38 to actual_dns (or vice verse); 39 if (thresh<(maxlen-minchange)/maxlen<1) 40 return true
  • 5. International Journal of Modern Trends in Engineering and Research (IJMTER) Volume 01, Issue 05, [November - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 408 41 return false; } [5] The Linkguard algorithm contains a main routine LinkGuard in which it extracts the DNS names from the actual and visual links. When a comparision of th names is links is made and it is found that these names are not same, or dotted decimal IP address is directly used in actual dns then phishing attack is recognized. On the other hand when there is no destination information(DNS or dotted IP address) in the visual link the subroutine AnalyzeDNS is called to handle phishing attack. While the pattern matching subroutine is used to identify other unkown attacks. In this method either the sender’s email address is extracted from the e-mail or a collection of all the DNS names is done in a “seed set”. After this the PatternMatching subroutine checks if the actual DNS name of a hyperlink is different from the DNS name in the sender’s address. Fig. 2. Flow Chart of the LinkGuard Algorithm Hence, in the above figure we have seen the workings and procedures involved in LinkGuard algorithm. Now we shall study the CANTINA algorithm. 2.2 CANTINA The CANTINA is a unprecedented content-based passage of detecting phishing websites. What makes the cantina intriguing is that other methods look at the shell peculiarity of the web pages whereas the CANTINA studies the substance or content of that webpage. Not only this, the approach also analyzes
  • 6. International Journal of Modern Trends in Engineering and Research (IJMTER) Volume 01, Issue 05, [November - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 409 the text-based content of the page itself. This approach works on the principle of a heuristic based algorithm known as the TF-IDF algorithm. Working of the TF-IDF: The TF-IDF algorithm is used for the purpose of information retrieval and text-mining. It succumbs importance to the occurrence of a word in a document. As the number of word appears in the document the importance increases. The term ‘TF’ stands for “term frequency” meaning the number of times a given term appears in a specific document. It usually helps in preventing a bias towards the longer documents and also measures the importance of that term within a particular document. On the other hand the term ‘IDF’ stands for “inverse document frequency” and is meant to define the general importance of the term i.e. its measures how common a particular word is in the entire document. Accordingly we can say that any term which has high TF-IDF weight means that it has high TF and low document frequency. Alongside the TF-IDF algorithm Robust Hyperlinks are also required in the CANTINA approach. Phelps and Wilensky evolved the idea of Robust Hyperlink to overcome the problem of broken links[6]. They proposed on adding a small number of some chosen terms which they called as “lexical signature” to the URL’s. Now here comes the role of the TF-IDF i.e. to generate a lexical signature. In order to generate the signatures the TF-IDF value of each word of a document is calculated and words which have highest values are then selected. The term frequency defines wholesomeness and IDF provides rarity in documents. So the work of the CANTINA can be supervened as:  Calculate the TF-IDF scores of each term on a given web page.  Clasp any five terms having the highest TF-IDF weights and generate a lexical signature.  Provide this signature to any search engine of your choice.  If the DNS name of the current webpage matches with the N top search results then we consider that there is no phishing or else it is a phish website. Fig. 3. Fake website which is uploaded to the internet.
  • 7. International Journal of Modern Trends in Engineering and Research (IJMTER) Volume 01, Issue 05, [November - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 410 Fig. 4. The search engine showing results whether the site is legitimate with the help of the lexical signature. III. COMPARING BENEFITS AND DRAWBACKS Commencing with the Pros and cons of both the approaches discussed above would be given as- Benefits of the Linkguard Algorithm can be given as:  One great asset is that the LinkGuard algorithm recognizes known as well as unknown phishing sites.  It is light-weighted and based on derived characteristics. Drawbacks of the LinkGuard algorithm can be given as:  Due its false negative differentiating nature the LinkGuard algorithm sometimes treats some phishing sites as normal websites( specifically those sites which have not been visted by the user at all).  When the comparability index of the site is near close for example say 3/4 it often generates a false positive. Now coming to the benefits of the CANTINA APPROACH:  Fast in nature and is based on the content-based characteristics.  No maintenance required by the user as most functionalities are performed by the search engines. Drawbacks of the CANTINA approach can be given as:  Language can be a drawback here as phishing occurs in all kinds of websites irrespective of their languages.  In order to overcome some of the quering problems of the search engines the algorithms will have to be implemented as server based email filter rather than a browser tool.
  • 8. International Journal of Modern Trends in Engineering and Research (IJMTER) Volume 01, Issue 05, [November - 2014] e-ISSN: 2349-9745, p-ISSN: 2393-8161 @IJMTER-2014, All rights Reserved 411 IV. CONCLUSION The frogery which has occurred due to phishing has taken a rigorous plunge in the recent years. In this paper, we have analogized the working principles of two approaches which are used for the detection of phishing Websites namely the LinkGuard and CANTINA. During this we implicited that the LinkGuard is an algorithm which differentiates on the basis of URL’s and DNS and is effective upto 96% detection including the unknown websites. On the other hand the CANTINA approach is a novel designed method involving heuristics and content based methods thereby increasing 97% detection of phishing with 6% false positives. Therefore, in the future work both these approaches can act as great barriers for keeping away all kinds of phishing such that user’s might not become victims to forgery or theft. REFERENCES [1] H.Kerstein, Paul (July 19,2005) [2] Anti-Phishing Working Group, Phishing Activity Trends Report. 2006. http://www.antiphishing.org/reports/. [3] Keizer, G., Phishing Costs Neraly $1 Billion, TechWeb TechnologyNews. http://www.techweb.com/wire/security/164902671. [4] McMillian, R.,Gartner: Consumers to lose $2.8 billion to phishers in 2006, Networkworld,2006. [5] U.Vidyasagar “Intelligent phishing website detection and prevention system by using LinkGuard algorithm” [6] Phelps, T.A. and R. Wilensky, Robust Hyperlinks and Locations, D-Lib Magazine, vol. 6(7/8),2000. http://www.dlib.org/dlib/july00/wilensky/07wilensky.html.