Avtar's ppt

Major Seminar
On
Knowledge Discovery from Web Logs

Guided By: Presented By:
Saurabh Anand Avtar kishore Gaur
Lecturer (IT/09/53)
Department Of IT VIII Sem, IT

Poornima College Of Engineering
Sitapura,Jaipur

Introduction
• Vast amount of Web site traversal information in the form
of Web logs are present.
• By analyzing these logs, it is possible to discover various
kinds of knowledge, which can be applied to improve the
performance of Web services.
• It is possible to learn the behavior of the Web users by
analyzing these logs.

Introduction
• A particularly kind of knowledge which can be immediately
applied to the operation of the Web site is called
Actionable knowledge.
• Mining of such knowledge is known as Knowledge
Discovery from Web Logs.

How big is the Web
• More then 4 billion websites are on Internet.(According to
alexa.com)

• At least 7.92 billion pages (Thursday, 23
February, 2012).(according to worldwidewebsize.com).

History
• Previous approaches was only aimed to mine Web-log
knowledge for human consumption.
• These days mining actionable knowledge from Web logs is
been used to improve the performance of Web Services.

Fields in Web Log File
• Reference Website www.hdwally.com Web Server: Apache
1. 66.249.71.6 - - [23/Feb/2012:06:23:46 -0600] "GET
/robots.txt HTTP/1.1" 500 7370 "-" "Mozilla/5.0
(compatible; Googlebot/2.1;
+http://www.google.com/bot.html)“
2. 180.76.5.92 - - [23/Feb/2012:06:11:04 -0600] "GET /
HTTP/1.1" 500 7370 "-" "Mozilla/5.0 (compatible;
Baiduspider/2.0;
+http://www.baidu.com/search/spider.html)“
• IP Adress:-66.249.71.6 and 180.76.5.92
• UserName:- -- and --
• Timestamp :- [23/Feb/2012:06:23:46 -0600] and -
[23/Feb/2012:06:11:04 -0600] (time of visit by webserver)

Fields in Web Log File
• Access request : "GET /robots.txt HTTP/1.1“ and "GET /
HTTP/1.1”
• Result status code : 500 and 500 (Internal Server Error)
• Bytes transferred : 7370 and 7370
• User Agent: Mozilla/5.0
• Referrer URL : (compatible; Googlebot/2.1;
+http://www.google.com/bot.html) and (compatible;
Baiduspider/2.0;
+http://www.baidu.com/search/spider.html)

Example Of a Web Log File
• fcrawler.looksmart.com - - [26/Apr/2000:00:00:12 -0400]
"GET /contacts.html HTTP/1.0" 200 4595 "-" "FAST-
WebCrawler/2.1-pre2 (ashen@looksmart.net)"
fcrawler.looksmart.com - - [26/Apr/2000:00:17:19 -0400]
"GET /news/news.html HTTP/1.0" 200 16716 "-" "FAST-
WebCrawler/2.1-pre2 (ashen@looksmart.net)“
• 123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET
/pics/wpaper.gif HTTP/1.0" 200 6248
"http://www.jafsoft.com/asctortf/" "Mozilla/4.05
(Macintosh; I; PPC )"

Mining Web Logs for Path Profiles
• Data Cleaning on Web Log Data
• Mining Web Logs for Path Profiles
• Web Object Prediction
• Learning to Prefetch Web Documents

Data Cleaning on Web Log Data
• Break apart a long sequence of visits by the users into user
sessions.
• Identify user by an individual IP address.
• Thus, data cleaning means to separate the visiting
sequence of pages into visiting sessions.

Web Log Mining for Prefetching
• We have separate visiting sessions.
• Now we can develop path profiles from these sessions as
user visiting a sequence of Web pages often leaves a trail of
the pages URL’s in a Web log.
• A path profile consists frequent subsequences from the
frequently occurring paths.
• Path profile helps us to predict the next pages that are
most likely to occur.

Web Object Prediction
• it is possible to train a path-based model for predicting
future URL's based on a sequence of current URL accesses.
• This can be done on a per-user basis, or on a per-server
basis.
• The former requires that the user-session be recognized
and broken down nicely through a filtering system, and the
latter takes the simplistic view that the accesses on a server
is a single long thread.

Learning to Prefetch Web Documents
• Original cache memory is partitioned into two parts: cache-
buffer and prefetching-buffer.
• A prefetching agent(Script) keeps pre-loading the
prefetching-buffer with documents predicted to access
next.

Web Page Clustering for Intelligent
User Interfaces
• Web Logs can be used to build server-side customization
and transformation to make website more convenient for
users to visit and find their objectives.
• They path prediction algorithms that guess where the user
wants to go next in a browsing session like WebWatcher
and PageGather algorythm.

Applications
• Search Engines
• Similarity Measures
• Ontology
• information aggregation
• Recognition technology
• Summarization
• E-commerce
• Content management

Advantages
• Its easy to implement.
• The companies can establish better customer relationship
by giving them exactly what they need.
• To create personalized search engines, which can
understand a person’s search queries in a personal way by
analyzing and profiling user’s search behavior.
• To improving caching and prefetching of Web objects.
• Use the mined knowledge for building better, adaptive user
interfaces.
• Applying Web query log knowledge to improving Web
search for a search engine application.

Reference
• Weblogs from www.hdwally.com and
www.hdwallpaper4u.com .
• www.jafsoft.com/searchengines/log_sample.html
• Research paper on Knowledge Discovery From Weblogs by
S Chandra and Dr B Kalpana.
• Researcalpana. paper on Mining Web Logs for Actionable
Knowledge by Qiang Yang, Charles X. Ling and Jianfeng Gao.
• http://www.galeas.de/webmining.html

Avtar's ppt

More Related Content

What's hot (20)

Similar to Avtar's ppt (20)

Avtar's ppt