SlideShare a Scribd company logo
ประสบการณ์การวิเคราะห์ข้อมูลด้วย
วิธีการทาเหมืองข้อมูล (Text Mining)



                                               ดร.อลิสา คงทน

                  นักวิจัย ห้องปฏิบัติการวิจัยวิทยาการมนุษยภาษา
           ศูนย์เทคโนโลยีอิเล็กทรอนิกส์และคอมพิวเตอร์แห่งชาติ



                                                                  1
Text Mining is about…



 “Sifting through vast collections of unstructured or
 semistructured data beyond the reach of data mining
 tools, text mining tracks information sources, links isolated
 concepts in distant documents, maps relationships
 between activities, and helps answer questions.”


                                   Tapping the Power of Text Mining
                             Communications of the ACM, Sept. 2006



                                                                      2
Humans VS. Computers
• Humans: Ability to distinguish and apply linguistic patterns to text

   – Could overcome language difficulties such as slangs, spelling
     variations, contextual meaning


• Computers: Ability to process text in large volumes at high speed
   – Could sift through a large collection of texts to find simple statistics
     and relationship among terms in an instant of time


• Text mining requires a combination of both
   Human's linguistic capability + computer's speed and accuracy


               NLP                                   Data Mining
Text Mining Tasks

• Information extraction:
  – Analyze unstructured text and identify key words or
    phrases and relationships within text
• Topic detection and tracking:
  – Filter and present only documents relevant to the user
    profile
• Summarization:
  – Text summarization reduces the content by retaining
    only its main points and overall meaning



                                                             4
Text Mining Tasks

• Categorization:
  – Automatic classify documents into predefined
    categories
• Clustering:
  – Group similar documents based on their similarity
• Concept Linkage
  – Connect related documents by identifying their shared
    concepts, helping users find information they perhaps
    wouldn't have found through traditional search methods



                                                             5
Text Mining Tasks

• Information Visualization
  – Represent documents or information in graphical
    formats for easily browsing, viewing, or searching
• Question and answering (Q&A)
  – Search and extract the best answer to a given question




                                                             6
Applications: Tech Mining

• Tech Mining is the application of text mining
  tools to science and technology (S&T)
  information particularly bibliographic abstracts

• It exploits the S&T databases to see patterns,
  detect associations, and foresee opportunities




                                                     7
Tech Mining Process




                      8
Technical Intelligences:
Who, What, When, Where?
• Digest multiple S&T information resources
• Profile Research Domains:
  –   Who?
  –   What?
  –   When?
  –   Where?
• Map Relationships: Topics & Teams
• Analyze Trends: What’s Hot & What’s Coming
• And do so -- Quickly

                                               9
What if I don’t have Tech
Mining Software?




                            10
What if I don’t have Tech
Mining Software?




                            11
Output example from Tech
Mining Software




Source: A.L. Porter, QTIP: quick technology intelligence processes, Technol. Forecast. Soc. Change 72 (2005)   12
Applications: Expert Finder




                              13
Applications: Expert Finder




                              14
Applications: Expert Finder




                              15
Applications: ABDUL
(Artificial BudDy U Love)

• An online information service which currently provides
  access to Thai linguistic (e.g., dictionary and sentence
  translation) and information resources (e.g., weather
  condition, stock price, gas price, traffic condition, etc.)


• Users are able to use natural language to interact with
  ABDUL via Instant Messaging (IM) based protocol, Web
  browser, and Mobile devices




                                                                16
Applications: ABDUL
(Artificial BudDy U Love)




                            17
Applications: ABDUL
(Artificial BudDy U Love)




                            18
Web 1.0 VS. Web 2.0




                      19
User-Generated Contents

• With the Web 2.0 or social networking websites, the
  amount of user-generated contents has increased
  exponentially


• User-generated contents often contain opinions and/or
  sentiments


• An in-depth analysis of these opinionated texts could
  reveal potentially useful information, e.g.,
  – Preferences of people towards many different topics including news
    events, social issues and commercial products



                                                                         20
Online Opinion Resources
Characteristics of Online
Reviews
• Natural language and unstructured text format

• Some reviews are long and contain only a few
  sentences expressing opinions on the product

• Could be difficult for a potential reader to
  understand and analyze each review that
  maybe relevant to his or her decision making


                                                  22
Opinion Mining

• Opinion mining and sentiment analysis is a task for
  analyzing and summarizing what people think about a
  certain topic


• Opinion mining has gained a lot of interest in text mining
  and NLP communities


• Three granularities of opinion mining:
  – Document level
  – Sentence level
  – Feature level

                                                               23
Feature-Based Opinion Mining

• This approach typically consists of two following
  steps:
      1. Identifying and extracting features of an object,
  topic or event from each sentence
      2. Determining whether the opinions regarding the
  features are positive or negative




                                                             24
Opinion Mining on Hotel Reviews in
Thailand (Graphical Display)




                                     25
Opinion Mining on Hotel Reviews in
Thailand (Textual Display)




                                     26
Comparison among Hotels




                          27
Opinion Mining on Mobile
Network Operators in Thailand




                                28
Opinion Mining on Mobile
Network Operators in Thailand




                                29
Challenges in Text Mining

• Text Mining = NLP + Data Mining
• Statistical NLP
  –   Ambiguity
  –   Context
  –   Tokenization  Sentence Detection
  –   POS tagging
• Data Mining
  – Ability to process the data
  – Massive amounts of data
  – Determining and extracting information of interest

                                                         30
Conclusions

• As the amount of data increases, text-mining
  tools that sift through it will be increasingly
  valuable

• Various applications for academic and industry
  uses




                                                    31
Thank you for your attention


           Q&A



                               32

More Related Content

PPTX
Text mining presentation in Data mining Area
PPT
18231979 Data Mining
PDF
Lecture 01 Data Mining
PPTX
9 Data Mining Challenges From Data Scientists Like You
PDF
I1802055259
PDF
Data mining and Machine learning expained in jargon free & lucid language
PPTX
Data Mining
Text mining presentation in Data mining Area
18231979 Data Mining
Lecture 01 Data Mining
9 Data Mining Challenges From Data Scientists Like You
I1802055259
Data mining and Machine learning expained in jargon free & lucid language
Data Mining

What's hot (20)

PPTX
Data mining and knowledge discovery
PPT
Chapter 1. Introduction
PPTX
Data mining services
PDF
10.1.1.118.1099
PDF
Data Mining: Future Trends and Applications
PPTX
Knowledge Discovery in Databases
PPT
Data mining-2
PPT
Data mining and knowledge Discovery
PPT
Data mining in agriculture
PPT
Introduction-to-Knowledge Discovery in Database
PPT
Introduction to data warehouse
PPT
Upstate CSCI 525 Data Mining Chapter 1
PPT
Introduction
PPTX
Introduction to Information Retrieval
PDF
Web_Mining_Overview_Nfaoui_El_Habib
PPTX
PPT
Introduction to DataMining
PPTX
Data Mining
Data mining and knowledge discovery
Chapter 1. Introduction
Data mining services
10.1.1.118.1099
Data Mining: Future Trends and Applications
Knowledge Discovery in Databases
Data mining-2
Data mining and knowledge Discovery
Data mining in agriculture
Introduction-to-Knowledge Discovery in Database
Introduction to data warehouse
Upstate CSCI 525 Data Mining Chapter 1
Introduction
Introduction to Information Retrieval
Web_Mining_Overview_Nfaoui_El_Habib
Introduction to DataMining
Data Mining
Ad

Similar to Text Mining : Experience (20)

PDF
Torsten Reimer
PDF
C N I20080404
PDF
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
PPTX
Semantic Technologies for Big Sciences including Astrophysics
PPTX
Text Mining
PDF
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...
PDF
KM - Cognitive Computing overview by Ken Martin 13Apr2016
PDF
Km cognitive computing overview by ken martin 19 jan2015
PPTX
Mining Web content for Enhanced Search
PPT
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
PPT
Presentation on the Warsaw Conference on National Bibliographies August 2012
PPT
information retirval system,search info insights in unsturtcured data
PPT
Information retrival system it is part and parcel
PPSX
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
PPTX
Chapter 1 Intro Information Rerieval.pptx
PPTX
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
PPTX
Introduction to Information Architecture & Design - 3/21/15
PPT
Metadata and Taxonomies for More Flexible Information Architecture
PPTX
Introduction to Information Architecture & Design - 6/20/15
PPTX
Introduction to Information Architecture & Design - SVA Workshop 03/22/14
Torsten Reimer
C N I20080404
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
Semantic Technologies for Big Sciences including Astrophysics
Text Mining
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...
KM - Cognitive Computing overview by Ken Martin 13Apr2016
Km cognitive computing overview by ken martin 19 jan2015
Mining Web content for Enhanced Search
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
Presentation on the Warsaw Conference on National Bibliographies August 2012
information retirval system,search info insights in unsturtcured data
Information retrival system it is part and parcel
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
Chapter 1 Intro Information Rerieval.pptx
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Introduction to Information Architecture & Design - 3/21/15
Metadata and Taxonomies for More Flexible Information Architecture
Introduction to Information Architecture & Design - 6/20/15
Introduction to Information Architecture & Design - SVA Workshop 03/22/14
Ad

More from Boonlert Aroonpiboon (20)

PDF
PDF
Scival for Research Performance
PDF
20190726 icde-session-chularat-nstda-4
PDF
20190409 social-media-backup
PDF
20190220 open-library
PDF
20190220 digital-archives
PDF
OER KKU Library
PDF
Museum digital-code
PDF
OER MOOC - Success Story
PDF
LAM Code of conduct
PDF
RLPD - OER MOOC
PDF
New Technology for Information Services
PDF
New Technology for Information Services
PDF
digital law for GLAM
PDF
20180919 digital-collections
PDF
Field-Weighted Citation Impact (FWCI)
PDF
20180828 digital-archives
PDF
Local Wisdom Information : How to
PDF
201403 etda-library-settup
PDF
201403 etda-library
Scival for Research Performance
20190726 icde-session-chularat-nstda-4
20190409 social-media-backup
20190220 open-library
20190220 digital-archives
OER KKU Library
Museum digital-code
OER MOOC - Success Story
LAM Code of conduct
RLPD - OER MOOC
New Technology for Information Services
New Technology for Information Services
digital law for GLAM
20180919 digital-collections
Field-Weighted Citation Impact (FWCI)
20180828 digital-archives
Local Wisdom Information : How to
201403 etda-library-settup
201403 etda-library

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Approach and Philosophy of On baking technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
KodekX | Application Modernization Development
PDF
cuic standard and advanced reporting.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPT
Teaching material agriculture food technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
20250228 LYD VKU AI Blended-Learning.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Per capita expenditure prediction using model stacking based on satellite ima...
Approach and Philosophy of On baking technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
NewMind AI Weekly Chronicles - August'25 Week I
Understanding_Digital_Forensics_Presentation.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Network Security Unit 5.pdf for BCA BBA.
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
KodekX | Application Modernization Development
cuic standard and advanced reporting.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Teaching material agriculture food technology
MYSQL Presentation for SQL database connectivity
Advanced methodologies resolving dimensionality complications for autism neur...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Text Mining : Experience

  • 1. ประสบการณ์การวิเคราะห์ข้อมูลด้วย วิธีการทาเหมืองข้อมูล (Text Mining) ดร.อลิสา คงทน นักวิจัย ห้องปฏิบัติการวิจัยวิทยาการมนุษยภาษา ศูนย์เทคโนโลยีอิเล็กทรอนิกส์และคอมพิวเตอร์แห่งชาติ 1
  • 2. Text Mining is about… “Sifting through vast collections of unstructured or semistructured data beyond the reach of data mining tools, text mining tracks information sources, links isolated concepts in distant documents, maps relationships between activities, and helps answer questions.” Tapping the Power of Text Mining Communications of the ACM, Sept. 2006 2
  • 3. Humans VS. Computers • Humans: Ability to distinguish and apply linguistic patterns to text – Could overcome language difficulties such as slangs, spelling variations, contextual meaning • Computers: Ability to process text in large volumes at high speed – Could sift through a large collection of texts to find simple statistics and relationship among terms in an instant of time • Text mining requires a combination of both Human's linguistic capability + computer's speed and accuracy NLP Data Mining
  • 4. Text Mining Tasks • Information extraction: – Analyze unstructured text and identify key words or phrases and relationships within text • Topic detection and tracking: – Filter and present only documents relevant to the user profile • Summarization: – Text summarization reduces the content by retaining only its main points and overall meaning 4
  • 5. Text Mining Tasks • Categorization: – Automatic classify documents into predefined categories • Clustering: – Group similar documents based on their similarity • Concept Linkage – Connect related documents by identifying their shared concepts, helping users find information they perhaps wouldn't have found through traditional search methods 5
  • 6. Text Mining Tasks • Information Visualization – Represent documents or information in graphical formats for easily browsing, viewing, or searching • Question and answering (Q&A) – Search and extract the best answer to a given question 6
  • 7. Applications: Tech Mining • Tech Mining is the application of text mining tools to science and technology (S&T) information particularly bibliographic abstracts • It exploits the S&T databases to see patterns, detect associations, and foresee opportunities 7
  • 9. Technical Intelligences: Who, What, When, Where? • Digest multiple S&T information resources • Profile Research Domains: – Who? – What? – When? – Where? • Map Relationships: Topics & Teams • Analyze Trends: What’s Hot & What’s Coming • And do so -- Quickly 9
  • 10. What if I don’t have Tech Mining Software? 10
  • 11. What if I don’t have Tech Mining Software? 11
  • 12. Output example from Tech Mining Software Source: A.L. Porter, QTIP: quick technology intelligence processes, Technol. Forecast. Soc. Change 72 (2005) 12
  • 16. Applications: ABDUL (Artificial BudDy U Love) • An online information service which currently provides access to Thai linguistic (e.g., dictionary and sentence translation) and information resources (e.g., weather condition, stock price, gas price, traffic condition, etc.) • Users are able to use natural language to interact with ABDUL via Instant Messaging (IM) based protocol, Web browser, and Mobile devices 16
  • 19. Web 1.0 VS. Web 2.0 19
  • 20. User-Generated Contents • With the Web 2.0 or social networking websites, the amount of user-generated contents has increased exponentially • User-generated contents often contain opinions and/or sentiments • An in-depth analysis of these opinionated texts could reveal potentially useful information, e.g., – Preferences of people towards many different topics including news events, social issues and commercial products 20
  • 22. Characteristics of Online Reviews • Natural language and unstructured text format • Some reviews are long and contain only a few sentences expressing opinions on the product • Could be difficult for a potential reader to understand and analyze each review that maybe relevant to his or her decision making 22
  • 23. Opinion Mining • Opinion mining and sentiment analysis is a task for analyzing and summarizing what people think about a certain topic • Opinion mining has gained a lot of interest in text mining and NLP communities • Three granularities of opinion mining: – Document level – Sentence level – Feature level 23
  • 24. Feature-Based Opinion Mining • This approach typically consists of two following steps: 1. Identifying and extracting features of an object, topic or event from each sentence 2. Determining whether the opinions regarding the features are positive or negative 24
  • 25. Opinion Mining on Hotel Reviews in Thailand (Graphical Display) 25
  • 26. Opinion Mining on Hotel Reviews in Thailand (Textual Display) 26
  • 28. Opinion Mining on Mobile Network Operators in Thailand 28
  • 29. Opinion Mining on Mobile Network Operators in Thailand 29
  • 30. Challenges in Text Mining • Text Mining = NLP + Data Mining • Statistical NLP – Ambiguity – Context – Tokenization Sentence Detection – POS tagging • Data Mining – Ability to process the data – Massive amounts of data – Determining and extracting information of interest 30
  • 31. Conclusions • As the amount of data increases, text-mining tools that sift through it will be increasingly valuable • Various applications for academic and industry uses 31
  • 32. Thank you for your attention Q&A 32