Annotating Search Results from Web Databases

 Introduction
 Existing System
 Proposed System
 Phases of system
 System Architecture
 System workflow
 Modules
 Advantages of Proposed System
 Algorithm used in system
 User classes
 Activity diagram
 Applications
 Software & Hardware requirement
 References

 Numbers of databases available from html
forms might be encoded using different
formatting in html tags.
 Data unit level annotation.
 Automatically assign labels to the data units of
SRRs returned from WDBs.
 Deep Web Data Collection Application or
Internet Comparison Shopping.

 In existing system data unit is a piece of text
that semantically represent one concept of an
entity.
 It describe relation between text node and data
unit.
 Early applications require tremendous human
efforts to annotate data units manually, which
severely limit their scalability.
 There is high demand for collecting data of
interest from multiple WDBs.
 In this proposed system we consider how to
automatically assign labels to the data units
within the SRRs returned from WDBs.

OUR APPROCH
 Align data units on as result page into different
groups such that data units in same group
having same semantic.
 For each group annotate with different aspects
of annotation.
 We consider how to automatically assign labels
to the data units within the SRRs returned from
WDBs.

 Our solution consists of three phases.
a) Alignment phase.
b)Annotation phase.
c)Annotation wrapper generation phase.

A) ALIGNMENT PHASE
• Identify all data units in SRRs.
• Organize them into different groups.
each group corresponding to a different
concepts.

B) ANNOTATION PHASE
• Introduce multiple basic annotators.
• Each exploiting one type of features.

C) ANNOTATION WRAPPER GENRATION PHASE
• Generate the annotation rules .
• Each rule describes how to extract the data units
of concepts which are given in annotation phase
in the result page.
• It also describe what the appropriate semantic
label should be.

Data Unit & Text Nodes’
Features
(Content, presentation style,
data-type, path, adjacency)
Data Unit Similarity
Alignment Algorithm
Local Schema & Integrated
Interface Schema
Table Annotator, Query Based
Annotator, Schema Value
Annotator, Frequency based
Annotator, In text prefix/ suffix
annotator, Common Knowledge
Annotator
Combining Annotators -> Build
Wrapper
Data alignment
Assigning labels

Annotating Search Results from Web Databases

 Data Unit and Tag Node Extraction:
 Identify relationship between text nodes & tag
nodes
 Data Unit and Text Node Features
 Data Alignment Algorithm
 Label Assignment

 One-to-One Relationship.
 One-to-Many Relationship.
 Many-to-One Relationship.
 One-To-Nothing Relationship.

 Data Content (DC)
 Presentation Style (PS)
 Data Type (DT)
 Tag Path (TP)
 Adjacency (AD)

 Data Unit Similarity.
 Data content similarity .
 Presentation style similarity .
 Presentation style similarity .
 Data type similarity .

Our data alignment method consists of the
following four steps.
 Merge text nodes.
 Align text nodes.
 Split (composite) text nodes.
 Align data units.

 Apply semantics labels for each data units
which got from SRR’s.

 We use data unit level annotation.
 We propose a clustering-based shifting
technique .(data units inside the same group
have the same semantic)
 To construct an annotation wrapper for any
given WDB. The wrapper can be applied to
efficiently annotating the SRRs retrieved from the
same WDB with new queries.

The various classes used in the
Interpretation search result from web database
are:
1) Wrapper- An annotation wrapper for the
search site is automatically constructed and can
be used to annotate new result pages from the
same web database.
2) Search engine-It reads the data from the web
database and provides to Data for comparison
shopping.
3) Wrapper builder-Combining annotator for
producing a result.

Sample
Web Pages
Record
Extraction
Reacords
Data
Alignments
Alignment
Groups
Annotator 1 Annotator 2 Annotator K
Combining
Annotation
Annotated
Groups
Generating
Annotation Groups
Annotation
Wrapper
Integrated Search Interface
Web Pages

 Web data collection.
 Internet comparison shopping.

 Operating system - Windows XP, 7
 Coding language - JAVA
 Development kit - JDK 1.6 & above
 Front End - JAVA Swing

 Processor - Pentium –IV
 Speed - 1.1 Ghz
 RAM - 256 MB(min)
 Hard Disk - 20 GB
 Motherboard - Intel 945 GLX

1] A. Arasu and H. Garcia-Molina, “Extracting Structured Data
from Web Pages,” Proc. SIGMOD Int’l Conf. Management of Data,
2003.
2] L. Arlotta, V. Crescenzi, G. Mecca, and P. Merialdo, “Automatic
Annotation of Data Extracted from Large Web Sites,” Proc. Sixth Int’l
Workshop the Web and Databases (WebDB), 2003.
3] P. Chan and S. Stolfo, “Experiments on Multistrategy Learning by
Meta-Learning,” Proc. Second Int’l Conf. Information and
Knowledge Management (CIKM), 1993.
4] W. Bruce Croft, “Combining Approaches for Information
Retrieval,” Advances in Information Retrieval: Recent Research
from the Center for Intelligent Information Retrieval, Kluwer
Academic, 2000.
5] V. Crescenzi, G. Mecca, and P. Merialdo, “RoadRUNNER: Towards
Automatic Data Extraction from Large Web Sites,” Proc. Very Large
Data Bases (VLDB) Conf., 2001.

Annotating Search Results from Web Databases

More Related Content

What's hot (17)

Similar to Annotating Search Results from Web Databases (20)

More from SWAMI06 (11)

Recently uploaded (20)

Annotating Search Results from Web Databases