SlideShare a Scribd company logo
Facilitating Document Annotation Using Content And
Querying Value
Abstract:
A large number of organizations today generate and share textual descriptions of
their products, services, and actions .Such collections of textual data contain
significant amount of structured information, which remains buried in the
unstructured text. While information extraction algorithms facilitate the extraction
of structured relations, they are often expensive and inaccurate, especially when
operating on top of text that does not contain any instances of the targeted
structured information. We present a novel alternative approach that facilitates
the generation of the structured metadata by identifying documents that are likely
to contain information of interest and this information is going to be subsequently
useful for querying the database. Our approach relies on the idea that humans are
more likely to add the necessary metadata during creation time, if prompted by
the interface; or that it is much easier for humans (and/or algorithms) to identify
the metadata when such information actually exists in the document, instead of
naively prompting users to fill in forms with information that is not available in the
document. As a major contribution of this paper, we present algorithms that
identify structured attributes that are likely to appear within the document ,by
jointly utilizing the content of the text and the query workload. Our experimental
evaluation shows that our approach generates superior results compared to
approaches that rely only on the textual content or only on the query workload, to
identify attributes of interest.
GLOBALSOFT TECHNOLOGIES
IEEE PROJECTS & SOFTWARE DEVELOPMENTS
IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS PROJECTS|IEEE
BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE PROJECTS
CELL: +91 98495 39085, +91 99662 35788, +91 98495 57908, +91 97014 40401
Visit: www.finalyearprojects.org Mail to:ieeefinalsemprojects@gmail.com
Architecture:
EXISTING SYSTEM:
Many systems, though, do not even have the basic “attribute-value” annotation
that would make a “pay-as-you-go” querying feasible. Existing work on query
forms can beleveraged in creating the CADS adaptive query forms. They propose
an algorithm to extract a query form that represents most of the queries in the
database using the ”querability” of the columns, while they extend their work
discussing forms customization. Some people use the schema information to auto-
complete attribute or value names in query forms. In keyword queries are used to
select the most appropriate query forms.
PROPOSED SYSTEM:
In this paper, we propose CADS (Collaborative Adaptive Data Sharing platform),
which is an “annotate-as-you-create” infrastructure that facilitates fielded data
annotation .A key contribution of our system is the direct use of the query
workload to direct the annotation process, in addition to examining the content of
the document. In other words, we are trying to prioritize the annotation of
documents towards generating attribute values for attributes that are often used
by querying users.
Modules :
1. Registration
2. Login
3. Document Upload
4. Search Techniques
5. Download Document
Modules Description
Registration:
In this module an Author(Creater) or User have to register
first,then only he/she has to access the data base.
Login:
In this module,any of the above mentioned person have
to login,they should login by giving their emailid and password .
Document Upload:
In this module Owner uploads an unstructured
document as file(along with meta data) into database,with the help of this
metadata and its contents,the end user has to download the file.He/She has to
enter content/query for download the file.
Search Techniques:
Here we are using two techniques for searching the document
1)Content Search,2)Query Search.
Content Search:
It means that the document will be downloaded by giving the
content which is present in the corresponding document. If its present the
corresponding document will be downloaded, otherwise it won’t.
Query Search:
It means that the document will be downloaded by using query
which has given in the base paper. If its input matches the document will get
download otherwise it won’t.
Download Document:
The User has to download the document using query/content
values which have given in the base paper. He/She enters the correct data in the
text boxes, if its correct it will download the file. Otherwise it won’t.
System Configuration:-
H/W System Configuration:-
Processor - Pentium –III
Speed - 1.1 GHz
RAM - 256 MB (min)
Hard Disk - 20 GB
Floppy Drive - 1.44 MB
Key Board - Standard Windows Keyboard
Mouse - Two or Three Button Mouse
Monitor - SVGA
S/W System Configuration:-
 Operating System :Windows95/98/2000/XP
 Application Server : Tomcat5.0/6.X
 Front End : HTML, Java, Jsp
 Scripts : JavaScript.
 Server side Script : Java Server Pages.
 Database : My sql
 Database Connectivity : JDBC.
Conclusion:
We proposed adaptive techniques to suggest relevant at-tributes to
annotate a document, while trying to satisfy the user querying needs. Our solution
is based on a probabilistic framework that considers the evidence in the document
content and the query workload. We present two ways to combine these two
pieces of evidence, content value and Querying value: a model that considers both
components conditionally independent and a linear weighted model. Experiments
shows that using our techniques, we can suggest attributes that improve the
visibility of the documents with respect to the query workload by up to 50%. That
is, we show that using the query workload can greatly improve the annotation
process and increase the utility of shared data.

More Related Content

DOCX
facilitating document annotation using content and querying value
DOCX
JPJ1421 Facilitating Document Annotation Using Content and Querying Value
DOCX
Facilitating document annotation using content and querying value
DOCX
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Facilitating document annotation usin...
PDF
Annotation Approach for Document with Recommendation
PDF
Annotating Search Results from Web Databases
PPTX
Annotating Search Results from Web Databases
PPTX
Ben Ryan (University of Leeds) – Timescapes Project
facilitating document annotation using content and querying value
JPJ1421 Facilitating Document Annotation Using Content and Querying Value
Facilitating document annotation using content and querying value
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Facilitating document annotation usin...
Annotation Approach for Document with Recommendation
Annotating Search Results from Web Databases
Annotating Search Results from Web Databases
Ben Ryan (University of Leeds) – Timescapes Project

What's hot (17)

PPSX
Annotating search results from web databases-IEEE Transaction Paper 2013
DOCX
Annotating search results from web databases
PPTX
Share point metadata
DOCX
Annotating search results from web databases
PDF
A Novel Data Extraction and Alignment Method for Web Databases
PDF
Implementation of Matching Tree Technique for Online Record Linkage
PDF
E017413647
PDF
At33264269
PDF
Optimization of Search Results with Duplicate Page Elimination using Usage Data
PDF
ANALYSIS OF RESEARCH ISSUES IN WEB DATA MINING
DOC
Introduction abstract
PDF
Paper id 37201536
PPTX
NOW! Get the internet to work for you!
PDF
B1803040412
PPTX
LinkedIn Segmentation & Targeting Platform
PDF
Vision Based Deep Web data Extraction on Nested Query Result Records
PDF
IRJET-Computational model for the processing of documents and support to the ...
Annotating search results from web databases-IEEE Transaction Paper 2013
Annotating search results from web databases
Share point metadata
Annotating search results from web databases
A Novel Data Extraction and Alignment Method for Web Databases
Implementation of Matching Tree Technique for Online Record Linkage
E017413647
At33264269
Optimization of Search Results with Duplicate Page Elimination using Usage Data
ANALYSIS OF RESEARCH ISSUES IN WEB DATA MINING
Introduction abstract
Paper id 37201536
NOW! Get the internet to work for you!
B1803040412
LinkedIn Segmentation & Targeting Platform
Vision Based Deep Web data Extraction on Nested Query Result Records
IRJET-Computational model for the processing of documents and support to the ...
Ad

Similar to Facilitating document annotation using content and querying value (20)

DOCX
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT Facilitating document annotation using ...
DOCX
Facilitating document annotation using content and querying value
PDF
Query-Based Retrieval of Annotated Document
PDF
Data mining model for the data retrieval from central server configuration
PPTX
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
PDF
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
PDF
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
PDF
Relevant updated data retrieval architectural model for continous text extrac...
PDF
RELEVANT UPDATED DATA RETRIEVAL ARCHITECTURAL MODEL FOR CONTINUOUS TEXT EXTRA...
PDF
RELEVANT UPDATED DATA RETRIEVAL ARCHITECTURAL MODEL FOR CONTINUOUS TEXT EXTRA...
PPT
Information Retrieval Models
PDF
A Novel Approach For Annotating Images By Semantic Similarity Keyword Based...
PDF
PPTX
Exploiting web search engines to search structured
PDF
Improving Annotations in Digital Documents using Document Features and Fuzzy ...
PDF
Answering approximate-queries-over-xml-data
DOCX
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
DOCX
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
DOC
Query aware determinization of uncertain objects
PDF
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT Facilitating document annotation using ...
Facilitating document annotation using content and querying value
Query-Based Retrieval of Annotated Document
Data mining model for the data retrieval from central server configuration
Keyword-based Search and Exploration on Databases (SIGMOD 2011)
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
Relevant updated data retrieval architectural model for continous text extrac...
RELEVANT UPDATED DATA RETRIEVAL ARCHITECTURAL MODEL FOR CONTINUOUS TEXT EXTRA...
RELEVANT UPDATED DATA RETRIEVAL ARCHITECTURAL MODEL FOR CONTINUOUS TEXT EXTRA...
Information Retrieval Models
A Novel Approach For Annotating Images By Semantic Similarity Keyword Based...
Exploiting web search engines to search structured
Improving Annotations in Digital Documents using Document Features and Fuzzy ...
Answering approximate-queries-over-xml-data
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
Query aware determinization of uncertain objects
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
Ad

More from IEEEFINALYEARPROJECTS (20)

DOCX
Scalable face image retrieval using attribute enhanced sparse codewords
DOCX
Scalable face image retrieval using attribute enhanced sparse codewords
DOCX
Reversible watermarking based on invariant image classification and dynamic h...
DOCX
Reversible data hiding with optimal value transfer
DOCX
Query adaptive image search with hash codes
DOCX
Noise reduction based on partial reference, dual-tree complex wavelet transfo...
DOCX
Local directional number pattern for face analysis face and expression recogn...
DOCX
An access point based fec mechanism for video transmission over wireless la ns
DOCX
Towards differential query services in cost efficient clouds
DOCX
Spoc a secure and privacy preserving opportunistic computing framework for mo...
DOCX
Secure and efficient data transmission for cluster based wireless sensor netw...
DOCX
Privacy preserving back propagation neural network learning over arbitrarily ...
DOCX
Non cooperative location privacy
DOCX
Harnessing the cloud for securely outsourcing large
DOCX
Geo community-based broadcasting for data dissemination in mobile social netw...
DOCX
Enabling data dynamic and indirect mutual trust for cloud computing storage s...
DOCX
Dynamic resource allocation using virtual machines for cloud computing enviro...
DOCX
A secure protocol for spontaneous wireless ad hoc networks creation
DOCX
Utility privacy tradeoff in databases an information-theoretic approach
DOCX
Two tales of privacy in online social networks
Scalable face image retrieval using attribute enhanced sparse codewords
Scalable face image retrieval using attribute enhanced sparse codewords
Reversible watermarking based on invariant image classification and dynamic h...
Reversible data hiding with optimal value transfer
Query adaptive image search with hash codes
Noise reduction based on partial reference, dual-tree complex wavelet transfo...
Local directional number pattern for face analysis face and expression recogn...
An access point based fec mechanism for video transmission over wireless la ns
Towards differential query services in cost efficient clouds
Spoc a secure and privacy preserving opportunistic computing framework for mo...
Secure and efficient data transmission for cluster based wireless sensor netw...
Privacy preserving back propagation neural network learning over arbitrarily ...
Non cooperative location privacy
Harnessing the cloud for securely outsourcing large
Geo community-based broadcasting for data dissemination in mobile social netw...
Enabling data dynamic and indirect mutual trust for cloud computing storage s...
Dynamic resource allocation using virtual machines for cloud computing enviro...
A secure protocol for spontaneous wireless ad hoc networks creation
Utility privacy tradeoff in databases an information-theoretic approach
Two tales of privacy in online social networks

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
A Presentation on Artificial Intelligence
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Spectroscopy.pptx food analysis technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Approach and Philosophy of On baking technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
MIND Revenue Release Quarter 2 2025 Press Release
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.
“AI and Expert System Decision Support & Business Intelligence Systems”
The AUB Centre for AI in Media Proposal.docx
Programs and apps: productivity, graphics, security and other tools
A Presentation on Artificial Intelligence
Chapter 3 Spatial Domain Image Processing.pdf
MYSQL Presentation for SQL database connectivity
Spectroscopy.pptx food analysis technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Weekly Chronicles - August'25-Week II
Approach and Philosophy of On baking technology

Facilitating document annotation using content and querying value

  • 1. Facilitating Document Annotation Using Content And Querying Value Abstract: A large number of organizations today generate and share textual descriptions of their products, services, and actions .Such collections of textual data contain significant amount of structured information, which remains buried in the unstructured text. While information extraction algorithms facilitate the extraction of structured relations, they are often expensive and inaccurate, especially when operating on top of text that does not contain any instances of the targeted structured information. We present a novel alternative approach that facilitates the generation of the structured metadata by identifying documents that are likely to contain information of interest and this information is going to be subsequently useful for querying the database. Our approach relies on the idea that humans are more likely to add the necessary metadata during creation time, if prompted by the interface; or that it is much easier for humans (and/or algorithms) to identify the metadata when such information actually exists in the document, instead of naively prompting users to fill in forms with information that is not available in the document. As a major contribution of this paper, we present algorithms that identify structured attributes that are likely to appear within the document ,by jointly utilizing the content of the text and the query workload. Our experimental evaluation shows that our approach generates superior results compared to approaches that rely only on the textual content or only on the query workload, to identify attributes of interest. GLOBALSOFT TECHNOLOGIES IEEE PROJECTS & SOFTWARE DEVELOPMENTS IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS PROJECTS|IEEE BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE PROJECTS CELL: +91 98495 39085, +91 99662 35788, +91 98495 57908, +91 97014 40401 Visit: www.finalyearprojects.org Mail to:ieeefinalsemprojects@gmail.com
  • 2. Architecture: EXISTING SYSTEM: Many systems, though, do not even have the basic “attribute-value” annotation that would make a “pay-as-you-go” querying feasible. Existing work on query forms can beleveraged in creating the CADS adaptive query forms. They propose an algorithm to extract a query form that represents most of the queries in the database using the ”querability” of the columns, while they extend their work discussing forms customization. Some people use the schema information to auto- complete attribute or value names in query forms. In keyword queries are used to select the most appropriate query forms.
  • 3. PROPOSED SYSTEM: In this paper, we propose CADS (Collaborative Adaptive Data Sharing platform), which is an “annotate-as-you-create” infrastructure that facilitates fielded data annotation .A key contribution of our system is the direct use of the query workload to direct the annotation process, in addition to examining the content of the document. In other words, we are trying to prioritize the annotation of documents towards generating attribute values for attributes that are often used by querying users. Modules : 1. Registration 2. Login 3. Document Upload 4. Search Techniques 5. Download Document Modules Description Registration: In this module an Author(Creater) or User have to register first,then only he/she has to access the data base. Login: In this module,any of the above mentioned person have to login,they should login by giving their emailid and password .
  • 4. Document Upload: In this module Owner uploads an unstructured document as file(along with meta data) into database,with the help of this metadata and its contents,the end user has to download the file.He/She has to enter content/query for download the file. Search Techniques: Here we are using two techniques for searching the document 1)Content Search,2)Query Search. Content Search: It means that the document will be downloaded by giving the content which is present in the corresponding document. If its present the corresponding document will be downloaded, otherwise it won’t. Query Search: It means that the document will be downloaded by using query which has given in the base paper. If its input matches the document will get download otherwise it won’t. Download Document: The User has to download the document using query/content values which have given in the base paper. He/She enters the correct data in the text boxes, if its correct it will download the file. Otherwise it won’t.
  • 5. System Configuration:- H/W System Configuration:- Processor - Pentium –III Speed - 1.1 GHz RAM - 256 MB (min) Hard Disk - 20 GB Floppy Drive - 1.44 MB Key Board - Standard Windows Keyboard Mouse - Two or Three Button Mouse Monitor - SVGA S/W System Configuration:-  Operating System :Windows95/98/2000/XP  Application Server : Tomcat5.0/6.X  Front End : HTML, Java, Jsp  Scripts : JavaScript.  Server side Script : Java Server Pages.  Database : My sql  Database Connectivity : JDBC.
  • 6. Conclusion: We proposed adaptive techniques to suggest relevant at-tributes to annotate a document, while trying to satisfy the user querying needs. Our solution is based on a probabilistic framework that considers the evidence in the document content and the query workload. We present two ways to combine these two pieces of evidence, content value and Querying value: a model that considers both components conditionally independent and a linear weighted model. Experiments shows that using our techniques, we can suggest attributes that improve the visibility of the documents with respect to the query workload by up to 50%. That is, we show that using the query workload can greatly improve the annotation process and increase the utility of shared data.