SlideShare a Scribd company logo
An Efficient Concept-based Mining Model for Enhancing Text
Clustering

Synopsis
ABSTRACT
Most of the common techniques in text mining are based on the statistical
analysis of a term either word or phrase. Statistical analysis of a term frequency
captures the importance of the term within a document only. However, two terms
can have the same frequency in their documents, but one term contributes more
to the meaning of its sentences than the other term. Thus, the underlying text
mining model should indicate terms that capture the semantics of text. In this
case, the mining model can capture terms that present the concept of the
sentence, which leads to discover the topic of the document.
A new concept-based mining model that analyzes terms on the sentence,
document, and corpus levels is introduced. The concept-based mining model can
effectively discriminate between non-important terms with respect to sentence
semantics and terms which hold the concepts that represent the sentence
meaning.
The proposed mining model consists of sentence-based concept analysis,
document-based concept analysis, corpus-based concept-analysis, and conceptbased similarity measure. The term which contributes to the sentence semantics
is analyzed on the sentence, document, and corpus levels rather than the
traditional analysis of the document only. The proposed model can efficiently find
significant matching concepts between documents according to the semantics of
their sentences. The similarity between documents is calculated based on a new
concept-based similarity measure.

The proposed similarity measure takes full

advantage of using the concept analysis measures on the sentence, document,
and corpus level in calculating the similarity between documents.
Large sets of experiments using the proposed concept-based mining model
on different datasets in text clustering are conducted.

The experiments

demonstrate extensive comparison between the concept-based analysis and the
traditional analysis.
Experimental results demonstrate the substantial enhancement of the
clustering quality using the sentence-based, document-based, corpus-based and
combined approach concept analysis.
Index Terms: – Concept-based mining model, sentence-based, documentbased,

corpus-based,

concept-based,

concept

analysis,

conceptual

term

frequency, concept-based similarity.

PROPOSED SYSTEM:
In this paper, a novel concept-based mining model is proposed.

The

proposed model captures the semantic structure of each term within a sentence
and document rather than the frequency of the term within a document only. In
the proposed model, three measures for analyzing concepts on the sentence,
document, and corpus levels are computed.
Each sentence is labeled by a semantic role labeler that determines the
terms which contribute to the sentence. Each term that has a semantic role in
the sentence, is called a concept. Concept can be either words or phrases and
are totally dependent on the semantic structure of the sentence.

When a new

document is introduced to the system, the proposed mining model can detect a
concept match from this document to all the previously processed documents in
the data set by scanning the new document and extracting the matching
concepts.
A new concept-based similarity measure which makes use of the concept
analysis on the sentence, document and corpus levels is proposed.
Following are the explanations of the important terms used in this
paper:
− Label,
− Term
− concept,
− Verb-argument structure.

SOFTWARE REQUIREMENTS :
Operating System

:

Win XP/ Linux

Language

:

Java

Database

:

Oracle

HARDWARE REQUIREMENT:
Processor

:

1.0 GHz

Ram

:

512 Mb

Hard disk

:

30GB

More Related Content

PDF
Text Mining: (Asynchronous Sequences)
PDF
IRJET- A Survey Paper on Text Summarization Methods
PDF
Based on the Influence Factors in the Heterogeneous Network t-path Similarity...
PDF
Sentence similarity-based-text-summarization-using-clusters
PDF
Text Mining at Feature Level: A Review
PDF
PDF
International Journal of Computational Engineering Research(IJCER)
Text Mining: (Asynchronous Sequences)
IRJET- A Survey Paper on Text Summarization Methods
Based on the Influence Factors in the Heterogeneous Network t-path Similarity...
Sentence similarity-based-text-summarization-using-clusters
Text Mining at Feature Level: A Review
International Journal of Computational Engineering Research(IJCER)

What's hot (18)

PDF
International Journal of Engineering Research and Development (IJERD)
PDF
Exploiting rhetorical relations to
PDF
Probabilistic Information Retrieval
PDF
Statistical Analysis based Hypothesis Testing Method in Biological Knowledge ...
PDF
Improvement of Text Summarization using Fuzzy Logic Based Method
PDF
Ay3313861388
PDF
An Improved Similarity Matching based Clustering Framework for Short and Sent...
PPTX
Supporting scientific discovery through linkages of literature and data
PDF
L0261075078
PPT
Using lexical chains for text summarization
PDF
The International Journal of Engineering and Science (IJES)
PDF
Query Answering Approach Based on Document Summarization
PDF
Towards optimize-ESA for text semantic similarity: A case study of biomedical...
PDF
09 9241 it co-citation network investigation (edit ari)
PPTX
Data Mining in Rediology reports
PDF
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
PDF
K0936266
PDF
An Improved Web Explorer using Explicit Semantic Similarity with ontology and...
International Journal of Engineering Research and Development (IJERD)
Exploiting rhetorical relations to
Probabilistic Information Retrieval
Statistical Analysis based Hypothesis Testing Method in Biological Knowledge ...
Improvement of Text Summarization using Fuzzy Logic Based Method
Ay3313861388
An Improved Similarity Matching based Clustering Framework for Short and Sent...
Supporting scientific discovery through linkages of literature and data
L0261075078
Using lexical chains for text summarization
The International Journal of Engineering and Science (IJES)
Query Answering Approach Based on Document Summarization
Towards optimize-ESA for text semantic similarity: A case study of biomedical...
09 9241 it co-citation network investigation (edit ari)
Data Mining in Rediology reports
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
K0936266
An Improved Web Explorer using Explicit Semantic Similarity with ontology and...
Ad

Viewers also liked (8)

DOC
Benefit based data caching in ad hoc networks (synopsis)
DOC
Personal authentication using 3 d finger geometry (synopsis)
PPTX
Java tutorial part 4
DOC
Computation efficient multicast key distribution(synopsis)
DOC
One to many distribution using recursive unicast trees(synopsis)
DOC
Mitigating performance degradation in congested sensor networks(synopsis)
PDF
Engineering
PPT
Web based development
Benefit based data caching in ad hoc networks (synopsis)
Personal authentication using 3 d finger geometry (synopsis)
Java tutorial part 4
Computation efficient multicast key distribution(synopsis)
One to many distribution using recursive unicast trees(synopsis)
Mitigating performance degradation in congested sensor networks(synopsis)
Engineering
Web based development
Ad

Similar to An efficient concept based mining model for enhancing text clustering(synopsis) (20)

PDF
Ju3517011704
PDF
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
PDF
Use text mining method to support criminal case judgment
PDF
Classification of News and Research Articles Using Text Pattern Mining
PDF
Concept integration using edit distance and n gram match
PDF
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
PDF
Dictionary based concept mining an application for turkish
PDF
IRJET- Concept Extraction from Ambiguous Text Document using K-Means
PDF
Semantic Based Model for Text Document Clustering with Idioms
PDF
Text mining on criminal documents
PDF
A Novel approach for Document Clustering using Concept Extraction
PDF
Machine learning for text document classification-efficient classification ap...
PDF
Context Sensitive Relatedness Measure of Word Pairs
PDF
G04124041046
PDF
Survey of Machine Learning Techniques in Textual Document Classification
PDF
Information Retrieval on Text using Concept Similarity
PDF
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
PDF
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
PDF
A Centroid And Relationship Based Clustering For Organizing Research Papers
PDF
Great model a model for the automatic generation of semantic relations betwee...
Ju3517011704
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
Use text mining method to support criminal case judgment
Classification of News and Research Articles Using Text Pattern Mining
Concept integration using edit distance and n gram match
DICTIONARY-BASED CONCEPT MINING: AN APPLICATION FOR TURKISH
Dictionary based concept mining an application for turkish
IRJET- Concept Extraction from Ambiguous Text Document using K-Means
Semantic Based Model for Text Document Clustering with Idioms
Text mining on criminal documents
A Novel approach for Document Clustering using Concept Extraction
Machine learning for text document classification-efficient classification ap...
Context Sensitive Relatedness Measure of Word Pairs
G04124041046
Survey of Machine Learning Techniques in Textual Document Classification
Information Retrieval on Text using Concept Similarity
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
A Centroid And Relationship Based Clustering For Organizing Research Papers
Great model a model for the automatic generation of semantic relations betwee...

More from Mumbai Academisc (20)

DOC
Non ieee java projects list
DOC
Non ieee dot net projects list
DOC
Ieee java projects list
DOC
Ieee 2014 java projects list
DOC
Ieee 2014 dot net projects list
DOC
Ieee 2013 java projects list
DOC
Ieee 2013 dot net projects list
DOC
Ieee 2012 dot net projects list
PPT
Spring ppt
PDF
Ejb notes
PDF
Java web programming
PDF
Java programming-examples
PPTX
Hibernate tutorial
DOCX
J2ee project lists:-Mumbai Academics
PPTX
Java tutorial part 3
PPTX
Java tutorial part 2
TXT
Project list
DOC
Predictive job scheduling in a connection limited system using parallel genet...
Non ieee java projects list
Non ieee dot net projects list
Ieee java projects list
Ieee 2014 java projects list
Ieee 2014 dot net projects list
Ieee 2013 java projects list
Ieee 2013 dot net projects list
Ieee 2012 dot net projects list
Spring ppt
Ejb notes
Java web programming
Java programming-examples
Hibernate tutorial
J2ee project lists:-Mumbai Academics
Java tutorial part 3
Java tutorial part 2
Project list
Predictive job scheduling in a connection limited system using parallel genet...

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Big Data Technologies - Introduction.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Cloud computing and distributed systems.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Approach and Philosophy of On baking technology
PDF
Empathic Computing: Creating Shared Understanding
The AUB Centre for AI in Media Proposal.docx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Programs and apps: productivity, graphics, security and other tools
Network Security Unit 5.pdf for BCA BBA.
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Weekly Chronicles - August'25-Week II
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Review of recent advances in non-invasive hemoglobin estimation
A comparative analysis of optical character recognition models for extracting...
Big Data Technologies - Introduction.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Cloud computing and distributed systems.
MYSQL Presentation for SQL database connectivity
Approach and Philosophy of On baking technology
Empathic Computing: Creating Shared Understanding

An efficient concept based mining model for enhancing text clustering(synopsis)

  • 1. An Efficient Concept-based Mining Model for Enhancing Text Clustering Synopsis
  • 2. ABSTRACT Most of the common techniques in text mining are based on the statistical analysis of a term either word or phrase. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Thus, the underlying text mining model should indicate terms that capture the semantics of text. In this case, the mining model can capture terms that present the concept of the sentence, which leads to discover the topic of the document. A new concept-based mining model that analyzes terms on the sentence, document, and corpus levels is introduced. The concept-based mining model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed mining model consists of sentence-based concept analysis, document-based concept analysis, corpus-based concept-analysis, and conceptbased similarity measure. The term which contributes to the sentence semantics is analyzed on the sentence, document, and corpus levels rather than the traditional analysis of the document only. The proposed model can efficiently find significant matching concepts between documents according to the semantics of their sentences. The similarity between documents is calculated based on a new concept-based similarity measure. The proposed similarity measure takes full advantage of using the concept analysis measures on the sentence, document, and corpus level in calculating the similarity between documents.
  • 3. Large sets of experiments using the proposed concept-based mining model on different datasets in text clustering are conducted. The experiments demonstrate extensive comparison between the concept-based analysis and the traditional analysis. Experimental results demonstrate the substantial enhancement of the clustering quality using the sentence-based, document-based, corpus-based and combined approach concept analysis. Index Terms: – Concept-based mining model, sentence-based, documentbased, corpus-based, concept-based, concept analysis, conceptual term frequency, concept-based similarity. PROPOSED SYSTEM: In this paper, a novel concept-based mining model is proposed. The proposed model captures the semantic structure of each term within a sentence and document rather than the frequency of the term within a document only. In the proposed model, three measures for analyzing concepts on the sentence, document, and corpus levels are computed. Each sentence is labeled by a semantic role labeler that determines the terms which contribute to the sentence. Each term that has a semantic role in the sentence, is called a concept. Concept can be either words or phrases and are totally dependent on the semantic structure of the sentence. When a new document is introduced to the system, the proposed mining model can detect a concept match from this document to all the previously processed documents in the data set by scanning the new document and extracting the matching concepts.
  • 4. A new concept-based similarity measure which makes use of the concept analysis on the sentence, document and corpus levels is proposed. Following are the explanations of the important terms used in this paper: − Label, − Term − concept, − Verb-argument structure. SOFTWARE REQUIREMENTS : Operating System : Win XP/ Linux Language : Java Database : Oracle HARDWARE REQUIREMENT: Processor : 1.0 GHz Ram : 512 Mb Hard disk : 30GB