SlideShare a Scribd company logo
Copyright © 2014 KNIME.com AG
Boston KNIME Users
Text Processing Applications
Kilian Thiel
KNIME
Copyright © 2014 KNIME.com AG
Agenda
• KNIME Crash Course
• Text Mining with KNIME: Mining Tripadvisor Data
• Text Mining with KNIME: Mining Amazon Reviews
(Anil Tarachandani)
• Networking Apero
2
Copyright © 2014 KNIME.com AG
Text Mining with KNIME: Mining Tripadvisor Data
Agenda
• The KNIME Textprocessing Extension
– Preliminaries
– Philosophy & Usage
• Classification of Tripadvisor Reviews
– Tripadvisor data
– Classification of reviews
3
Copyright © 2014 KNIME.com AG
Resources
http://tech.knime.org/knime-text-processing
• Documentation
• Examples
• Forum
• White Papers
4
Copyright © 2014 KNIME.com AG
Installation
5
1.) 2.)
Copyright © 2014 KNIME.com AG
Requirements
Requirements to import and run demo workflows
• KNIME 2.10
• Textprocessing (labs)
• Distance Matrix (KNIME)
• Palladian (Community)
6
Copyright © 2014 KNIME.com AG
Tips
• Settings (knime.ini)
– Set maximum memory for KNIME
– -Xmx3G
7
Copyright © 2014 KNIME.com AG
Demo
Prepare KNIME
• Go to KNIME directory
• Change knime.ini file (optional)
– -Xmx3G
• Start KNIME
• Install Textprocessing Extension
– (or better have it already installed)
8
Copyright © 2014 KNIME.com AG
Philosophy
9
… perhaps your name
is
Rumpelstiltskin[Perso
n] ? …
… perhaps your name
is
Rumpelstiltskin[Perso
n] ? … Visualization
Cluster-
ing
Classifi-
cation
1 1 1 0 1 0 0 1 1
0 1 1 0 0 1 0 0 0
0 0 1 1 1 0 1 1 0
Copyright © 2014 KNIME.com AG
Additional Data Types
• Document Cell
– Encapsulates a document
• Title, sentences, terms, words
• Authors, category, source
• Generic meta data (key, value pairs)
• Term Cell
– Encapsulates a term
• Words, tags
10
Copyright © 2014 KNIME.com AG
Data Table Structures
• Document table
– List of documents
• Bag of words
– Tuples of documents
and terms
• Document vectors
– Numerical
representations of
documents
11
Copyright © 2014 KNIME.com AG
Philosophy and Data Table Structures
12
Enrichment Preprocessing
1 1 1 0
1 0 0 1
Documents Bow VectorsDocuments Documents
Copyright © 2014 KNIME.com AG
Tripadvisor Data
13
Title
Author
Rating
Fulltext
Copyright © 2014 KNIME.com AG
Tripadvisor Data
14
Reviews about italian and chinese restaurants in
Boston
• Chinese: 272
• Italian: 268
Copyright © 2014 KNIME.com AG
Tripadvisor Data
15
Goal:
• Build classifier to distinguish between chinese and
italian restaurants, based on their reviews.
Review about italian or
chinese restaurant?
Copyright © 2014 KNIME.com AG
Tripadvisor Data
16
Goal:
Copyright © 2014 KNIME.com AG
1.) Reading
Read/Parse textual data
17
Copyright © 2014 KNIME.com AG
Demo
Reading
• Read Tripadvisor data (.table file)
• Filter rows with missing restaurant value
• Convert strings to documents
• Filter all but the document column
18
Copyright © 2014 KNIME.com AG
2.) Enrichment
Enrich documents with semantic information
19
Copyright © 2014 KNIME.com AG
Demo
Enrichment / Tagging
• Apply POS Tagger node
• Use Bag of Words node to inspect tagging result
20
Copyright © 2014 KNIME.com AG
3.) Preprocessing
Preprocess documents and filter words
21
Copyright © 2014 KNIME.com AG
Demo
Preprocessing
• Filter
– Numbers
– Punctuation marks
– Stop Words
• Convert to lower case
• Stemming
• Keep only nouns, verbs, adjectives
22
Copyright © 2014 KNIME.com AG
4.) Transformation
Creation of numerical representation of documents
23
Copyright © 2014 KNIME.com AG
Demo
Transformation
• Transform to bag of word
• Compute TF value for terms
• Transform to document vectors
• Extract category (class) value
24
Copyright © 2014 KNIME.com AG
5.) Classification
Training of a model (decision tree) and scoring
25
Copyright © 2014 KNIME.com AG
Demo
Classification
• Append color based on class
• Partition data into training and test set
• Train decision tree model in training data
• Apply decision tree model on test data
• Score model, measure accuracy
26
Copyright © 2014 KNIME.com AG
Additional Workflows
• Multi Word Tagging
– Detection of frequent Ngrams
– Creation of dictionary from Ngrams
– Applying Dictionary Tagger
• Classification with Multi Words
• Clustering of documents
27
Copyright © 2014 KNIME.com AG
Thank You
40k
60k
20k
28
Questions
• http://tech.knime.org/forum
• Kilian.Thiel@knime.com
Follow us
• Twitter: @KNIME
• LinkedIn: https://www.linkedin.com/groups?gid=2212172
• KNIME Blog: http://www.knime.org/blog

More Related Content

PPTX
Elastic Compute Cloud (EC2) on AWS Presentation
PDF
Vector database
PPTX
Cluster computing
PPTX
Various Cloud offerings AWS/AZURE/GCP
PDF
Cloud Security And Privacy
PDF
Cloud computing and agriculture
PDF
Kafka with IBM Event Streams - Technical Presentation
PDF
Big Data Analytics for Real Time Systems
Elastic Compute Cloud (EC2) on AWS Presentation
Vector database
Cluster computing
Various Cloud offerings AWS/AZURE/GCP
Cloud Security And Privacy
Cloud computing and agriculture
Kafka with IBM Event Streams - Technical Presentation
Big Data Analytics for Real Time Systems

What's hot (20)

PPTX
Disadvantages of cloud computing
PDF
Application Modernisation through Event-Driven Microservices
PPTX
Intuit - Machine learning platform lifecycle management 2018
PDF
From Insights to Action, How to build and maintain a Data Driven Organization...
PDF
Building Modern Streaming Analytics with Confluent on AWS
PDF
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
DOCX
Comparing public and private cloud
PDF
AIOps: Your DevOps Co-Pilot
PPTX
Information storage and management
PPTX
Block Chain Cloud Technology
PDF
Lambda@Edge를통한멀티리전기반글로벌트래픽길들이기::이상현::AWS Summit Seoul 2018
PPT
HCI 3e - Ch 15: Task analysis
PDF
CyberArk Interview Questions and Answers for 2022.pdf
PPTX
Cloud computing
PDF
20180724 AWS Black Belt Online Seminar Amazon Elastic Container Service for K...
PDF
데이터 라벨링 노가다는 이제 그만 - Amazon Sagemaker Ground Truth :: 소성운 -...
PDF
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
PDF
Recommender Systems
PPTX
Distributed computing
PDF
User behavior analytics
Disadvantages of cloud computing
Application Modernisation through Event-Driven Microservices
Intuit - Machine learning platform lifecycle management 2018
From Insights to Action, How to build and maintain a Data Driven Organization...
Building Modern Streaming Analytics with Confluent on AWS
Tableau Conference 2018: Binging on Data - Enabling Analytics at Netflix
Comparing public and private cloud
AIOps: Your DevOps Co-Pilot
Information storage and management
Block Chain Cloud Technology
Lambda@Edge를통한멀티리전기반글로벌트래픽길들이기::이상현::AWS Summit Seoul 2018
HCI 3e - Ch 15: Task analysis
CyberArk Interview Questions and Answers for 2022.pdf
Cloud computing
20180724 AWS Black Belt Online Seminar Amazon Elastic Container Service for K...
데이터 라벨링 노가다는 이제 그만 - Amazon Sagemaker Ground Truth :: 소성운 -...
Building Large-scale Real-world Recommender Systems - Recsys2012 tutorial
Recommender Systems
Distributed computing
User behavior analytics
Ad

Viewers also liked (20)

PDF
KNIME tutorial
PDF
Big Data with KNIME is as easy as 1, 2, 3, ...4!
PPT
PPTX
Text mining and Visualizations
PDF
Just add Imagination
PDF
KNIME - Create Workflow with KNIME
PPTX
Knime
PPTX
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
PDF
Manual Básico Knime
PPTX
Introduction to knime
PPTX
Webinar Social Media Analytics - Using KNIME
PPTX
Apresentação Webinar – Analytics em Mídia Sociais
DOC
CURRICULO_LeonardoLopes _20160623
PDF
Introduction to R Package Recommendation System Competition
PPTX
Knime Evaluation Smaller
PDF
Knime customer intelligence on social media: Text Analytics vs. Network Mining
PPTX
Sentiment analysis
PPTX
The Web Analytics Swiss Army Knife
PPTX
A review of sentiment analysis approaches in big
PPTX
Sentiment Analysis Using Machine Learning
KNIME tutorial
Big Data with KNIME is as easy as 1, 2, 3, ...4!
Text mining and Visualizations
Just add Imagination
KNIME - Create Workflow with KNIME
Knime
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
Manual Básico Knime
Introduction to knime
Webinar Social Media Analytics - Using KNIME
Apresentação Webinar – Analytics em Mídia Sociais
CURRICULO_LeonardoLopes _20160623
Introduction to R Package Recommendation System Competition
Knime Evaluation Smaller
Knime customer intelligence on social media: Text Analytics vs. Network Mining
Sentiment analysis
The Web Analytics Swiss Army Knife
A review of sentiment analysis approaches in big
Sentiment Analysis Using Machine Learning
Ad

Similar to Text Processing with KNIME (14)

PDF
Sentiment Analysis with KNIME Analytics Platform
PDF
Using Data Science to Transform OpenTable Into Your Local Dining Expert
PDF
Location Embeddings for Next Trip Recommendation
PPTX
Recommendation Architecture - OpenTable - RecSys 2014 - Large Scale Recommend...
PDF
Knime & bioinformatics
PPTX
Text mining of reviews
PDF
Heterogeneous Data Mining with Spark
PPTX
Yelp's Review Filtering Algorithm Powerpoint
PDF
Zomato Crawler & Recommender
PPTX
Yelp Dataset Challenge 2015
PDF
Using Data Science to Transform OpenTable Into Your Local Dining Expert-(Pabl...
PDF
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
PDF
Keynote @iSWAG2015
PDF
Master in Big Data Analytics and Social Mining 20015
Sentiment Analysis with KNIME Analytics Platform
Using Data Science to Transform OpenTable Into Your Local Dining Expert
Location Embeddings for Next Trip Recommendation
Recommendation Architecture - OpenTable - RecSys 2014 - Large Scale Recommend...
Knime & bioinformatics
Text mining of reviews
Heterogeneous Data Mining with Spark
Yelp's Review Filtering Algorithm Powerpoint
Zomato Crawler & Recommender
Yelp Dataset Challenge 2015
Using Data Science to Transform OpenTable Into Your Local Dining Expert-(Pabl...
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Keynote @iSWAG2015
Master in Big Data Analytics and Social Mining 20015

More from KNIMESlides (20)

PDF
What's New in KNIME Analytics Platform 4.1
PDF
Codeless Deep Learning for Language Modeling and Image Classification
PDF
Automating Inferences out of Financial Data
PDF
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
PDF
Credit Card Fraud Detection Tutorial
PDF
Practicing Data Science: A Collection of Case Studies
PDF
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
PDF
Webinar: Behind the Scenes on Guided Analytics
PDF
KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019
PDF
Scoring Metrics for Classification Models
PDF
Open Source Story and what’s new in KNIME Software
PDF
Anomaly Detection - Discover unknown Frauds and Anomalies using Machine Learning
PDF
Sharing and Deploying Data Science with KNIME Server
PDF
Guided Automation- A Blueprint for Interactive Automated Machine Learning
PDF
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
PDF
Chemistry Data Basics with KNIME Analytics Platform
PDF
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
PDF
KNIME Data Science Learnathon: From Raw Data To Deployment
PDF
KNIME Software Overview
PPTX
From Raw Data to Deployment
What's New in KNIME Analytics Platform 4.1
Codeless Deep Learning for Language Modeling and Image Classification
Automating Inferences out of Financial Data
Credit Card Fraud Detection Tutorial - KNIME Meetup Berlin 2020
Credit Card Fraud Detection Tutorial
Practicing Data Science: A Collection of Case Studies
What's New in KNIME Analytics Platform 4.0 and KNIME Server 4.9
Webinar: Behind the Scenes on Guided Analytics
KNIME Data Science Learnathon: From Raw Data To Deployment - Dublin - June 2019
Scoring Metrics for Classification Models
Open Source Story and what’s new in KNIME Software
Anomaly Detection - Discover unknown Frauds and Anomalies using Machine Learning
Sharing and Deploying Data Science with KNIME Server
Guided Automation- A Blueprint for Interactive Automated Machine Learning
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
Chemistry Data Basics with KNIME Analytics Platform
Sentiment Analysis with Deep Learning, Machine Learning or Lexicon based
KNIME Data Science Learnathon: From Raw Data To Deployment
KNIME Software Overview
From Raw Data to Deployment

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Logistic Regression ml machine learning.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Clinical guidelines as a resource for EBP(1).pdf
Reliability_Chapter_ presentation 1221.5784
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Business Ppt On Nestle.pptx huunnnhhgfvu
Acceptance and paychological effects of mandatory extra coach I classes.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Foundation of Data Science unit number two notes
Supervised vs unsupervised machine learning algorithms
Introduction to Knowledge Engineering Part 1
Logistic Regression ml machine learning.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Taxes Foundatisdcsdcsdon Certificate.pdf
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx

Text Processing with KNIME

  • 1. Copyright © 2014 KNIME.com AG Boston KNIME Users Text Processing Applications Kilian Thiel KNIME
  • 2. Copyright © 2014 KNIME.com AG Agenda • KNIME Crash Course • Text Mining with KNIME: Mining Tripadvisor Data • Text Mining with KNIME: Mining Amazon Reviews (Anil Tarachandani) • Networking Apero 2
  • 3. Copyright © 2014 KNIME.com AG Text Mining with KNIME: Mining Tripadvisor Data Agenda • The KNIME Textprocessing Extension – Preliminaries – Philosophy & Usage • Classification of Tripadvisor Reviews – Tripadvisor data – Classification of reviews 3
  • 4. Copyright © 2014 KNIME.com AG Resources http://tech.knime.org/knime-text-processing • Documentation • Examples • Forum • White Papers 4
  • 5. Copyright © 2014 KNIME.com AG Installation 5 1.) 2.)
  • 6. Copyright © 2014 KNIME.com AG Requirements Requirements to import and run demo workflows • KNIME 2.10 • Textprocessing (labs) • Distance Matrix (KNIME) • Palladian (Community) 6
  • 7. Copyright © 2014 KNIME.com AG Tips • Settings (knime.ini) – Set maximum memory for KNIME – -Xmx3G 7
  • 8. Copyright © 2014 KNIME.com AG Demo Prepare KNIME • Go to KNIME directory • Change knime.ini file (optional) – -Xmx3G • Start KNIME • Install Textprocessing Extension – (or better have it already installed) 8
  • 9. Copyright © 2014 KNIME.com AG Philosophy 9 … perhaps your name is Rumpelstiltskin[Perso n] ? … … perhaps your name is Rumpelstiltskin[Perso n] ? … Visualization Cluster- ing Classifi- cation 1 1 1 0 1 0 0 1 1 0 1 1 0 0 1 0 0 0 0 0 1 1 1 0 1 1 0
  • 10. Copyright © 2014 KNIME.com AG Additional Data Types • Document Cell – Encapsulates a document • Title, sentences, terms, words • Authors, category, source • Generic meta data (key, value pairs) • Term Cell – Encapsulates a term • Words, tags 10
  • 11. Copyright © 2014 KNIME.com AG Data Table Structures • Document table – List of documents • Bag of words – Tuples of documents and terms • Document vectors – Numerical representations of documents 11
  • 12. Copyright © 2014 KNIME.com AG Philosophy and Data Table Structures 12 Enrichment Preprocessing 1 1 1 0 1 0 0 1 Documents Bow VectorsDocuments Documents
  • 13. Copyright © 2014 KNIME.com AG Tripadvisor Data 13 Title Author Rating Fulltext
  • 14. Copyright © 2014 KNIME.com AG Tripadvisor Data 14 Reviews about italian and chinese restaurants in Boston • Chinese: 272 • Italian: 268
  • 15. Copyright © 2014 KNIME.com AG Tripadvisor Data 15 Goal: • Build classifier to distinguish between chinese and italian restaurants, based on their reviews. Review about italian or chinese restaurant?
  • 16. Copyright © 2014 KNIME.com AG Tripadvisor Data 16 Goal:
  • 17. Copyright © 2014 KNIME.com AG 1.) Reading Read/Parse textual data 17
  • 18. Copyright © 2014 KNIME.com AG Demo Reading • Read Tripadvisor data (.table file) • Filter rows with missing restaurant value • Convert strings to documents • Filter all but the document column 18
  • 19. Copyright © 2014 KNIME.com AG 2.) Enrichment Enrich documents with semantic information 19
  • 20. Copyright © 2014 KNIME.com AG Demo Enrichment / Tagging • Apply POS Tagger node • Use Bag of Words node to inspect tagging result 20
  • 21. Copyright © 2014 KNIME.com AG 3.) Preprocessing Preprocess documents and filter words 21
  • 22. Copyright © 2014 KNIME.com AG Demo Preprocessing • Filter – Numbers – Punctuation marks – Stop Words • Convert to lower case • Stemming • Keep only nouns, verbs, adjectives 22
  • 23. Copyright © 2014 KNIME.com AG 4.) Transformation Creation of numerical representation of documents 23
  • 24. Copyright © 2014 KNIME.com AG Demo Transformation • Transform to bag of word • Compute TF value for terms • Transform to document vectors • Extract category (class) value 24
  • 25. Copyright © 2014 KNIME.com AG 5.) Classification Training of a model (decision tree) and scoring 25
  • 26. Copyright © 2014 KNIME.com AG Demo Classification • Append color based on class • Partition data into training and test set • Train decision tree model in training data • Apply decision tree model on test data • Score model, measure accuracy 26
  • 27. Copyright © 2014 KNIME.com AG Additional Workflows • Multi Word Tagging – Detection of frequent Ngrams – Creation of dictionary from Ngrams – Applying Dictionary Tagger • Classification with Multi Words • Clustering of documents 27
  • 28. Copyright © 2014 KNIME.com AG Thank You 40k 60k 20k 28 Questions • http://tech.knime.org/forum • Kilian.Thiel@knime.com Follow us • Twitter: @KNIME • LinkedIn: https://www.linkedin.com/groups?gid=2212172 • KNIME Blog: http://www.knime.org/blog