SlideShare a Scribd company logo
2
Most read
7
Most read
8
Most read
Web Mining
Presented by:
Sarthak Kumar Sahoo
Computer Science & Engineering
Section: B
Regdno: 1501209160
Contents
• Introduction
• Introduction Discovered by Web Mining
• Steps in Web Mining
• Different types of Web Mining
• Web Usage Mining
• Web Structure Mining
• Web Structure Mining Terminologies
• Web Content Mining
• Different Methods Used in Web Mining
• Web Mining Applications
• Difference Between Web mining & Data Mining
Introduction
• It is the process of using data mining techniques where it uses
different algorithms to extract information directly from the Web by
extracting it from web documents and services, web content,
hyperlinks and server logs.
• The main goal of the Web mining is to search for the patterns in
web data by collecting and analyzing information in order to gain
insight into trends, the industry and users in general.
• The primary data source is World Wide Web.
• There are 3 general classes of information that can be discovered by
Web mining.
Information Discovered by Web Mining
Web Activity Web Graph Web Content
Server logs and web browser
activity tracking
Link between pages, people and other
data
Data found on the web pages and
inside of documents
Steps in Content Web Mining
Web
data
Collect
Parse
AnalyzeProduce
Report, Search
index etc
Fetch the content from
the web
Extract useable data
from formatted data
Tokenize, rate, classify,
cluster, filter, sort etc
Turn the result of
analysis into something
useful
Different types of Web Mining
Web Mining
Web Usage
Mining
Web Content
Mining
Web
Structure
Mining
Web Usage Mining
• This methodology is used to discover interesting usage patterns from Web data
in order to understand and better serve the need of web-based application.
• Usage data captures the identity, origin of web users along with their browsing
behaviour at a website.
Web Usage Mining classification according
to usage data
Web Server Data Application Server Data Application Level Data
Web Server data, like
IP address, page
reference & access
time
The ability to track various
kinds of business events
and log them in
application server logs.
New kinds of events can be
defined in an application, and
logging can be turned on for
them thus generating
histories of these specially
defined events.
Web Structure Mining
• Web Structure mining is the process of discovering structure information from the
web.
• Web Structure mining uses graph theory to analyze the node and connection structure
to the website.
• Web Structure mining can be divided into 2 type:
 Extracting patterns from hyperlink
 Mining the document structure: analysis of the tree-like structure of page.
Web document
hyperlinks
Web Structure Mining Terminology
• Web Graph: directed graph representing the web
• Node: Web page in graph
• Edge: hyperlinks
• In degree: number of links pointing to particular node
• Out degree: number of links generated from particular node
Web Content Mining
• Web content mining is the mining, extraction and integration of useful data,
information and knowledge from web page content.
• The contents of the web pages are mostly text, images and video and audio files.
• From information retrieval purpose techniques of Natural Language Processing and
intelligent web agent is used.
• The agent based-approach to web mining leads to the development of sophisticated
AI systems.
• Web content mining can be differentiated in 2 point of view: Information retrieval
view and database view.
• For Information retrieval view, the research work is done through the unstructured
data and semi-structured data (HTML structure & Hyperlink Structure).
Web Content Mining(contd)
• As per the database point of view in order to have the better
information management and querying on the web, the mining
always tries to infer the structure of the website to transform
website to become a database.
• With the help of multi-scanning approach feature selection
approach can be used.
Different Methods used in Web Mining
• Pattern analysis
• Classification accuracy
• Information Score
• Information gain
• Cross entropy
• Mutual information
• Odds Ratio
Web Mining Applications
• E-Commerce
• Information Filtering
• Fraud Detection
• Education & Research
Difference between Web Mining & Data Mining
Data Mining Web Mining
In traditional data mining approach processing
1 million records from database is a large job.
Here even 10 million pages wouldn’t be a big
number.
When doing data mining for corporate
information, the data is private and often
require access to read.
For Web mining data is public and rarely
requires access rights.
A traditional data mining task gets information
from a database, which provides some level of
explicit structure.
A typical web mining task is processing
unstructured or semi-structured data from
web pages. Even when the underlying
information for web pages comes from a
database, this often is obscured by HTML
markup.
THANK YOU.

More Related Content

PPTX
Web mining (1)
PPTX
Web mining
ODP
Introduction to Web Scraping using Python and Beautiful Soup
PDF
Data mining
PPTX
PPT
Web mining
PDF
Getting started with Web Scraping in Python
PPT
data mining
Web mining (1)
Web mining
Introduction to Web Scraping using Python and Beautiful Soup
Data mining
Web mining
Getting started with Web Scraping in Python
data mining

What's hot (20)

PDF
What is Web-scraping?
PDF
Tutorial on Web Scraping in Python
PPT
introduction to data mining tutorial
PPTX
Web mining (structure mining)
PPT
PDF
Scraping data from the web and documents
PPT
Web Scraping and Data Extraction Service
ODP
Web Content Mining
PPT
Web Mining
PPTX
Web Mining Presentation Final
PPTX
Web mining
PPTX
Web Mining
PPTX
Introduction to Data mining
PDF
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
ODP
Web Information Retrieval and Mining
PPT
Data mining
PPTX
Web crawler
PPTX
Brighton SEO: SEO + PPC Working Together
PDF
Web Crawling & Crawler
PPTX
HITS + Pagerank
What is Web-scraping?
Tutorial on Web Scraping in Python
introduction to data mining tutorial
Web mining (structure mining)
Scraping data from the web and documents
Web Scraping and Data Extraction Service
Web Content Mining
Web Mining
Web Mining Presentation Final
Web mining
Web Mining
Introduction to Data mining
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
Web Information Retrieval and Mining
Data mining
Web crawler
Brighton SEO: SEO + PPC Working Together
Web Crawling & Crawler
HITS + Pagerank
Ad

Similar to Web mining (20)

PPTX
Web Mining
PPTX
Web mining
PPTX
Web content mining
PPTX
Web mining
PPTX
Web Mining
PDF
PDF
01635156
PDF
Aa03401490154
PPT
Web mining
PDF
RESEARCH ISSUES IN WEB MINING
PDF
RESEARCH ISSUES IN WEB MINING
PDF
RESEARCH ISSUES IN WEB MINING
PDF
RESEARCH ISSUES IN WEB MINING
PDF
RESEARCH ISSUES IN WEB MINING
PDF
RESEARCH ISSUES IN WEB MINING
PDF
RESEARCH ISSUES IN WEB MINING
PDF
Business Intelligence: A Rapidly Growing Option through Web Mining
DOCX
Minning www
PPTX
WEB MININGG.pptx go to thw lab where we found ppt
PPTX
WEB MINING.pptx
Web Mining
Web mining
Web content mining
Web mining
Web Mining
01635156
Aa03401490154
Web mining
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
RESEARCH ISSUES IN WEB MINING
Business Intelligence: A Rapidly Growing Option through Web Mining
Minning www
WEB MININGG.pptx go to thw lab where we found ppt
WEB MINING.pptx
Ad

Recently uploaded (20)

PDF
Modernizing your data center with Dell and AMD
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Approach and Philosophy of On baking technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
A Presentation on Artificial Intelligence
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Cloud computing and distributed systems.
PDF
KodekX | Application Modernization Development
Modernizing your data center with Dell and AMD
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Machine learning based COVID-19 study performance prediction
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Approach and Philosophy of On baking technology
Unlocking AI with Model Context Protocol (MCP)
CIFDAQ's Market Insight: SEC Turns Pro Crypto
A Presentation on Artificial Intelligence
Digital-Transformation-Roadmap-for-Companies.pptx
MYSQL Presentation for SQL database connectivity
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Review of recent advances in non-invasive hemoglobin estimation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Spectral efficient network and resource selection model in 5G networks
The Rise and Fall of 3GPP – Time for a Sabbatical?
Cloud computing and distributed systems.
KodekX | Application Modernization Development

Web mining

  • 1. Web Mining Presented by: Sarthak Kumar Sahoo Computer Science & Engineering Section: B Regdno: 1501209160
  • 2. Contents • Introduction • Introduction Discovered by Web Mining • Steps in Web Mining • Different types of Web Mining • Web Usage Mining • Web Structure Mining • Web Structure Mining Terminologies • Web Content Mining • Different Methods Used in Web Mining • Web Mining Applications • Difference Between Web mining & Data Mining
  • 3. Introduction • It is the process of using data mining techniques where it uses different algorithms to extract information directly from the Web by extracting it from web documents and services, web content, hyperlinks and server logs. • The main goal of the Web mining is to search for the patterns in web data by collecting and analyzing information in order to gain insight into trends, the industry and users in general. • The primary data source is World Wide Web. • There are 3 general classes of information that can be discovered by Web mining.
  • 4. Information Discovered by Web Mining Web Activity Web Graph Web Content Server logs and web browser activity tracking Link between pages, people and other data Data found on the web pages and inside of documents
  • 5. Steps in Content Web Mining Web data Collect Parse AnalyzeProduce Report, Search index etc Fetch the content from the web Extract useable data from formatted data Tokenize, rate, classify, cluster, filter, sort etc Turn the result of analysis into something useful
  • 6. Different types of Web Mining Web Mining Web Usage Mining Web Content Mining Web Structure Mining
  • 7. Web Usage Mining • This methodology is used to discover interesting usage patterns from Web data in order to understand and better serve the need of web-based application. • Usage data captures the identity, origin of web users along with their browsing behaviour at a website. Web Usage Mining classification according to usage data Web Server Data Application Server Data Application Level Data Web Server data, like IP address, page reference & access time The ability to track various kinds of business events and log them in application server logs. New kinds of events can be defined in an application, and logging can be turned on for them thus generating histories of these specially defined events.
  • 8. Web Structure Mining • Web Structure mining is the process of discovering structure information from the web. • Web Structure mining uses graph theory to analyze the node and connection structure to the website. • Web Structure mining can be divided into 2 type:  Extracting patterns from hyperlink  Mining the document structure: analysis of the tree-like structure of page. Web document hyperlinks
  • 9. Web Structure Mining Terminology • Web Graph: directed graph representing the web • Node: Web page in graph • Edge: hyperlinks • In degree: number of links pointing to particular node • Out degree: number of links generated from particular node
  • 10. Web Content Mining • Web content mining is the mining, extraction and integration of useful data, information and knowledge from web page content. • The contents of the web pages are mostly text, images and video and audio files. • From information retrieval purpose techniques of Natural Language Processing and intelligent web agent is used. • The agent based-approach to web mining leads to the development of sophisticated AI systems. • Web content mining can be differentiated in 2 point of view: Information retrieval view and database view. • For Information retrieval view, the research work is done through the unstructured data and semi-structured data (HTML structure & Hyperlink Structure).
  • 11. Web Content Mining(contd) • As per the database point of view in order to have the better information management and querying on the web, the mining always tries to infer the structure of the website to transform website to become a database. • With the help of multi-scanning approach feature selection approach can be used.
  • 12. Different Methods used in Web Mining • Pattern analysis • Classification accuracy • Information Score • Information gain • Cross entropy • Mutual information • Odds Ratio
  • 13. Web Mining Applications • E-Commerce • Information Filtering • Fraud Detection • Education & Research
  • 14. Difference between Web Mining & Data Mining Data Mining Web Mining In traditional data mining approach processing 1 million records from database is a large job. Here even 10 million pages wouldn’t be a big number. When doing data mining for corporate information, the data is private and often require access to read. For Web mining data is public and rarely requires access rights. A traditional data mining task gets information from a database, which provides some level of explicit structure. A typical web mining task is processing unstructured or semi-structured data from web pages. Even when the underlying information for web pages comes from a database, this often is obscured by HTML markup.