2. Text data mining can be
described as the process of
extracting essential data from
standard language text. All the
data that we generate via text
messages, documents, emails,
files are written in common
language text.Text mining is
primarily used to draw useful
insights or patterns from such
data.
3. AREAS OF TEXT MINING IN DATA MINING:
These are the following area of text mining :
4. INFORMATION EXTRACTION:
The automatic extraction of structured data such as
entities, entities relationships, and attributes
describing entities from an unstructured source is
called information extraction.
5. NATURAL LANGUAGE PROCESSING:
NLP stands for Natural language processing.
Computer software can understand human language
as same as it is spoken. NLP is primarily a
component of artificial intelligence(AI).The
development of the NLP application is difficult
because computers generally expect humans to
“Speak” to them in a programming language that is
accurate, clear, and exceptionally structured. Human
speech is usually not authentic so that it can depend
on many complex variables, including slang, social
context, and regional dialects.
6. DATA MINING:
Data mining refers to the extraction of useful
data, hidden patterns from large data sets.
Data mining tools can predict behaviors and
future trends that allow businesses to make a
better data-driven decision. Data mining
tools can be used to resolve many business
problems that have traditionally been too
time-consuming.
7. INFORMATION RETRIEVAL:
Information retrieval deals with retrieving useful data
from data that is stored in our systems. Alternately, as
an analogy, we can view search engines that happen
on websites such as e-commerce sites or any other
sites as part of information retrieval.
8. TEXT MINING APPROACHES IN DATA MINING:
These are the following text mining approaches that are used in
data mining.
9. 1. KEYWORD-BASED ASSOCIATION ANALYSIS:
It collects sets of keywords or terms that often happen together and
afterward discover the association relationship among them. First, it
preprocesses the text data by parsing, stemming, removing stop
words, etc. Once it pre-processed the data, then it induces
association mining algorithms. Here, human effort is not required, so
the number of unwanted results and the execution time is reduced.
10. 2. DOCUMENT CLASSIFICATION ANALYSIS / AUTOMATIC
DOCUMENT CLASSIFICATION:
This analysis is used for the automatic classification of the huge
number of online text documents like web pages, emails, etc.Text
document classification varies with the classification of relational
data as document databases are not organized according to attribute
values pairs.
11. TEXT MINING PROCESS:
The text mining process incorporates the following steps to extract
the data from the document.
12. TEXT TRANSFORMATION:
A text transformation is a technique that is used to control the
capitalization of the text.
Here the two major way of document representation is given.
Bag of words
1. Vector Space
13. TEXT PRE-PROCESSING:
Pre-processing is a significant task and a critical step in Text Mining,
Natural Language Processing (NLP), and information retrieval(IR). In
the field of text mining, data pre-processing is used for extracting
useful information and knowledge from unstructured text data.
Information Retrieval (IR) is a matter of choosing which documents in
a collection should be retrieved to fulfill the user’s need.
14. FEATURE SELECTION:
Feature selection is a significant part of data mining. Feature selection
can be defined as the process of reducing the input of processing or
finding the essential information sources.The feature selection is also
called variable selection.
15. APPLICATIONS:
These are the following text mining applications:
Risk Management:
Risk Management is a systematic and logical procedure of analyzing, identifying,
treating, and monitoring the risks involved in any action or process in
organizations. Insufficient risk analysis is usually a leading cause of
disappointment.
Customer Care Service:
Text mining methods, particularly NLP, are finding increasing significance in the
field of customer care.The primary objective of text analysis is to reduce the
response time of the organizations and help to address the complaints of the
customer rapidly and productivelyfrom different sources such as customer
feedback, surveys, customer calls, etc.
16. Business Intelligence:
Companies and business firms have started to use text mining strategies as a major
aspect of their business intelligence. Besides providing significant insights into
customer behavior and trends, text mining strategies also support organizations to
analyze the qualities and weaknesses of their opponent’s so, giving them a
competitive advantage in the market.
Social Media Analysis:
Social media analysis helps to track the online data, and there are numerous text
mining tools designed particularly for performance analysis of social media sites.
These tools help to monitor and interpret the text generated via the internet from the
news, emails, blogs, etc.
17. DATA MINING:
Now, in this step, the text mining procedure merges with the
conventional process. Classic Data Mining procedures are used in
the structural database.