An efficient concept based mining model for enhancing text clustering(synopsis)

An Efficient Concept-based Mining Model for Enhancing Text
Clustering

Synopsis

ABSTRACT
Most of the common techniques in text mining are based on the statistical
analysis of a term either word or phrase. Statistical analysis of a term frequency
captures the importance of the term within a document only. However, two terms
can have the same frequency in their documents, but one term contributes more
to the meaning of its sentences than the other term. Thus, the underlying text
mining model should indicate terms that capture the semantics of text. In this
case, the mining model can capture terms that present the concept of the
sentence, which leads to discover the topic of the document.
A new concept-based mining model that analyzes terms on the sentence,
document, and corpus levels is introduced. The concept-based mining model can
effectively discriminate between non-important terms with respect to sentence
semantics and terms which hold the concepts that represent the sentence
meaning.
The proposed mining model consists of sentence-based concept analysis,
document-based concept analysis, corpus-based concept-analysis, and conceptbased similarity measure. The term which contributes to the sentence semantics
is analyzed on the sentence, document, and corpus levels rather than the
traditional analysis of the document only. The proposed model can efficiently find
significant matching concepts between documents according to the semantics of
their sentences. The similarity between documents is calculated based on a new
concept-based similarity measure.

The proposed similarity measure takes full

advantage of using the concept analysis measures on the sentence, document,
and corpus level in calculating the similarity between documents.

Large sets of experiments using the proposed concept-based mining model
on different datasets in text clustering are conducted.

The experiments

demonstrate extensive comparison between the concept-based analysis and the
traditional analysis.
Experimental results demonstrate the substantial enhancement of the
clustering quality using the sentence-based, document-based, corpus-based and
combined approach concept analysis.
Index Terms: – Concept-based mining model, sentence-based, documentbased,

corpus-based,

concept-based,

concept

analysis,

conceptual

term

frequency, concept-based similarity.

PROPOSED SYSTEM:
In this paper, a novel concept-based mining model is proposed.

The

proposed model captures the semantic structure of each term within a sentence
and document rather than the frequency of the term within a document only. In
the proposed model, three measures for analyzing concepts on the sentence,
document, and corpus levels are computed.
Each sentence is labeled by a semantic role labeler that determines the
terms which contribute to the sentence. Each term that has a semantic role in
the sentence, is called a concept. Concept can be either words or phrases and
are totally dependent on the semantic structure of the sentence.

When a new

document is introduced to the system, the proposed mining model can detect a
concept match from this document to all the previously processed documents in
the data set by scanning the new document and extracting the matching
concepts.

A new concept-based similarity measure which makes use of the concept
analysis on the sentence, document and corpus levels is proposed.
Following are the explanations of the important terms used in this
paper:
− Label,
− Term
− concept,
− Verb-argument structure.

SOFTWARE REQUIREMENTS :
Operating System

:

Win XP/ Linux

Language

:

Java

Database

:

Oracle

HARDWARE REQUIREMENT:
Processor

:

1.0 GHz

Ram

:

512 Mb

Hard disk

:

30GB

An efficient concept based mining model for enhancing text clustering(synopsis)

More Related Content

What's hot (18)

Viewers also liked (8)

Similar to An efficient concept based mining model for enhancing text clustering(synopsis) (20)

More from Mumbai Academisc (20)

Recently uploaded (20)

An efficient concept based mining model for enhancing text clustering(synopsis)