SlideShare a Scribd company logo
The Power of Unstructured Data
Olga Scrivner, PhD
Research Scientist, CNS, Indiana University
Visiting Lecturer, Data Science Program, Indiana University
Corporate Faculty, Data Analytics, Harrisburg University of Science & Technology
Recommendation Systems
Transforming Data into Insights
80% of data will be unstructured
(IDC)
Data-Driven Decision Making (credits: PwC)
“Information is the
currency of this
digital age”
Carly Fiorina, Former CEO
of HP
2025
1 zettabyte = 1021 bytes
1 175 zettabytes of data globally
(IDC)
85% of customer interaction will
be without human interaction
(Gartner)
2
3
Use Cases
Jim Kitterman. 2018. The Why behind the What.
Banking (Fraud prediction
& Recommendations)
Human Resources
(Automated HR)
Marketing (Automated
Customer service)
Retail (Product
Recommendations)
Two of the leading drivers for AI adoption are delivering a
better customer experience and helping employees to get
better at their jobs (IDC, 2020)
Leading AI Use Cases: automated customer service agents, recommendation, and automation
Text Mining Landscape
(Zhai, 2016)
Real World Text Data
Observed World
(English)
Formal Language
(Chiang, 2018)
Natural Language
- Full of ambiguity
- Use of contextual
clues and other
information
Ambiguity
- Nearly or completely
unambiguous
- Any statement has exactly
one meaning, regardless of
context
- Verbose to reduce
ambiguity
- Redundant
Redundancy- Concise
- Less redundant
- More than one
meaning
- Many idioms and
metaphors
Literalness- Exactly one meaning
She spilled the beans
http://www.idioms4you.com/complete-idioms/spill-the-beans.html
https://www.quora.com/When-was-the-first-English-idiom-used-Why-was-it-used
Dan Jurafsky. 2012. Slides – Introduction to NLP
Natural Language Challenges
Sarkar, D. 2018. Deep Learning Methods for Text Data – Word2Vec, GloVe, FastText. Towards Data Science
Based on distributed representations (a dense
representations of words in a low-dimensional vector
space): Word2Vec, FastText
Prediction-Based
Models
Word is associated with a
continuous vector
representation
NLP Feature Extractions
Count-based: TF, TF-IDF, N-grams
Bag-of-Words
Models
NLP Landscape
(Zhai, 2016)
Real World Text Data
Observed World
AI Cognitive Application
NLP Application – Recommender System
1
2
3
Improving with Use: Customer retention
Improving Cart Value: Filter system (Amazon)
Improving Engagement: Using subscriptions (YouTube)
Corinna Underwood. 2020. Use Cases of Recommendation Systems.
Recommendation System Types
Collaborative
Filtering
Shortcoming: Cold Start Problem
Content-Based
Systems
User-Based
Users Similarity
(Classification task)
Item-Based
Items Similarity based on
Ratings (Pearson)
Similarity between Features
(Nearest Neighbor)
User Likes and Feedback
Rounak Banik. 2018. Hands-On Recommendation Systems with Python.
NLP Content-Based Recommendation
Job-recommendation System
Armand Olivares. 2019. NLP Content-Based Recommendation Systems.Data: Kaggle - job-recommendation-datasets
Job Description Preprocessing
Data: Kaggle - job-recommendation-datasets
Armand Olivares. 2019. NLP Content-Based Recommendation Systems.
1. Remove stop words
2. Remove not alphanumeric characters
3. Lemmatize the columns
4. Extract features (TF-IDF)
5. Use Cosine similarity (scores close to
one = more similarity between items)
Combined title, company, city, job type, description
vector1 vector2
Euclidean Distance
components of vectors
What is Next?
Career path recommendation
Skill recommendation
Course recommendation
e-recruiting
Graph-Based approach + NLP
Job recommendation
(Zhu et al., 2020)

More Related Content

PPTX
HICSS ATLT: Advances in Teaching and Learning Technologies
PPTX
Building Effective Visualization Shiny WVF
PPTX
If a picture is worth a thousand words, Interactive data visualizations are w...
PPTX
Web and Complex Systems Lab @ Kno.e.sis
PPTX
Citizen Sensor Data Mining, Social Media Analytics and Applications
PPTX
Smart Data - How you and I will exploit Big Data for personalized digital hea...
PDF
2020_12_11 «Opening Education with Artificial Intelligence» - Mitja Jermol
PPT
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
HICSS ATLT: Advances in Teaching and Learning Technologies
Building Effective Visualization Shiny WVF
If a picture is worth a thousand words, Interactive data visualizations are w...
Web and Complex Systems Lab @ Kno.e.sis
Citizen Sensor Data Mining, Social Media Analytics and Applications
Smart Data - How you and I will exploit Big Data for personalized digital hea...
2020_12_11 «Opening Education with Artificial Intelligence» - Mitja Jermol
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...

What's hot (18)

PPTX
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
PPTX
A Semantics-based Approach to Machine Perception
PPTX
Smart IoT for Connected Manufacturing
PDF
NLP-based personal learning assistant for school education
PPTX
The UVA School of Data Science
PPT
4 Environmental Sustainability Ws Nithya Ramanathan
PPTX
University Public Driven Applications - Big Data and Organizational Design
PDF
A Proposed Model for a Web-Based Academic Advising System
PPTX
Learn Chemistry with Augmented Reality
PDF
The human face of AI: how collective and augmented intelligence can help sol...
DOCX
Ai project-report
PDF
Quantitative Digital Backchannel: Developing a Web-Based Audience Response Sy...
PPTX
Non-Technologist’s Guide to Technology Support of Adult Learners
PPTX
Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
PPTX
Knowledge Will Propel Machine Understanding of Big Data
DOCX
Shivani jain
PPTX
UVA School of Data Science
PDF
A Novel Frame Work System Used In Mobile with Cloud Based Environment
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
A Semantics-based Approach to Machine Perception
Smart IoT for Connected Manufacturing
NLP-based personal learning assistant for school education
The UVA School of Data Science
4 Environmental Sustainability Ws Nithya Ramanathan
University Public Driven Applications - Big Data and Organizational Design
A Proposed Model for a Web-Based Academic Advising System
Learn Chemistry with Augmented Reality
The human face of AI: how collective and augmented intelligence can help sol...
Ai project-report
Quantitative Digital Backchannel: Developing a Web-Based Audience Response Sy...
Non-Technologist’s Guide to Technology Support of Adult Learners
Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Knowledge Will Propel Machine Understanding of Big Data
Shivani jain
UVA School of Data Science
A Novel Frame Work System Used In Mobile with Cloud Based Environment
Ad

Similar to The power of unstructured data: Recommendation systems (20)

PPTX
COMPREHENSIVE REVIEW OF AI-for-recommendations.pptx
PPTX
SMART Seminar Series: "From Big Data to Smart data"
PDF
How to clean data less through Linked (Open Data) approach?
PPTX
Artificial Intelligence adoption factor in the University libraries of Pakist...
PDF
AI, Search, and the Disruption of Knowledge Management
PPTX
Machine Learning - Challenges, Learnings & Opportunities
PDF
Sistemas de Recomendação sem Enrolação
PPTX
Recommendation system (1).pptx
PDF
recommendationsystem1-221109055232-c8b46131.pdf
PDF
Harendra Singh, AI Strategy and Consulting Portfolio
PDF
AI in Multi Billion Search Engines. Career building in AI / Search. What make...
PDF
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
PPTX
Chapter1-Introduction Εισαγωγικές έννοιες
PPTX
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
PPTX
Human-centered AI: how can we support end-users to interact with AI?
PDF
Using NLP Approach for Analyzing Customer Reviews
PDF
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
PPTX
Building a Career in Data Science -WiCDS meetup
PDF
A Recommendation Engine For Predicting Movie Ratings Using A Big Data Approach
PDF
Human vs AI Quality Raters for Search Engines.pdf
COMPREHENSIVE REVIEW OF AI-for-recommendations.pptx
SMART Seminar Series: "From Big Data to Smart data"
How to clean data less through Linked (Open Data) approach?
Artificial Intelligence adoption factor in the University libraries of Pakist...
AI, Search, and the Disruption of Knowledge Management
Machine Learning - Challenges, Learnings & Opportunities
Sistemas de Recomendação sem Enrolação
Recommendation system (1).pptx
recommendationsystem1-221109055232-c8b46131.pdf
Harendra Singh, AI Strategy and Consulting Portfolio
AI in Multi Billion Search Engines. Career building in AI / Search. What make...
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
Chapter1-Introduction Εισαγωγικές έννοιες
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Human-centered AI: how can we support end-users to interact with AI?
Using NLP Approach for Analyzing Customer Reviews
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
Building a Career in Data Science -WiCDS meetup
A Recommendation Engine For Predicting Movie Ratings Using A Big Data Approach
Human vs AI Quality Raters for Search Engines.pdf
Ad

More from Olga Scrivner (20)

PPTX
Engaging Students Competition and Polls.pptx
PPTX
Cognitive executive functions and Opioid Use Disorder
PDF
Introduction to Web Scraping with Python
PDF
Call for paper Collaboration Systems and Technology
PDF
Jupyter machine learning crash course
PDF
R and RMarkdown crash course
PDF
The Impact of Language Requirement on Students' Performance, Retention, and M...
PPTX
Introduction to Interactive Shiny Web Application
PDF
Introduction to Overleaf Workshop
PDF
R crash course for Business Analytics Course K303
PDF
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
PDF
Gender Disparity in Employment and Education
PDF
CrashCourse: Python with DataCamp and Jupyter for Beginners
PDF
Optimizing Data Analysis: Web application with Shiny
PDF
Data Analysis and Visualization: R Workflow
PDF
Reproducible visual analytics of public opioid data
PPTX
Building Shiny Application Series - Layout and HTML
PDF
Introduction to R - from Rstudio to ggplot
PDF
Visual Analytics for Linguistics - Day 5 ESSLLI 2017
PDF
Visual Analytics for Linguistics - Day 4 ESSLLI - structured data
Engaging Students Competition and Polls.pptx
Cognitive executive functions and Opioid Use Disorder
Introduction to Web Scraping with Python
Call for paper Collaboration Systems and Technology
Jupyter machine learning crash course
R and RMarkdown crash course
The Impact of Language Requirement on Students' Performance, Retention, and M...
Introduction to Interactive Shiny Web Application
Introduction to Overleaf Workshop
R crash course for Business Analytics Course K303
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Gender Disparity in Employment and Education
CrashCourse: Python with DataCamp and Jupyter for Beginners
Optimizing Data Analysis: Web application with Shiny
Data Analysis and Visualization: R Workflow
Reproducible visual analytics of public opioid data
Building Shiny Application Series - Layout and HTML
Introduction to R - from Rstudio to ggplot
Visual Analytics for Linguistics - Day 5 ESSLLI 2017
Visual Analytics for Linguistics - Day 4 ESSLLI - structured data

Recently uploaded (20)

PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Lecture1 pattern recognition............
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Mega Projects Data Mega Projects Data
Reliability_Chapter_ presentation 1221.5784
Lecture1 pattern recognition............
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
IB Computer Science - Internal Assessment.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Introduction-to-Cloud-ComputingFinal.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
oil_refinery_comprehensive_20250804084928 (1).pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Introduction to machine learning and Linear Models
STUDY DESIGN details- Lt Col Maksud (21).pptx
Mega Projects Data Mega Projects Data

The power of unstructured data: Recommendation systems

  • 1. The Power of Unstructured Data Olga Scrivner, PhD Research Scientist, CNS, Indiana University Visiting Lecturer, Data Science Program, Indiana University Corporate Faculty, Data Analytics, Harrisburg University of Science & Technology Recommendation Systems
  • 2. Transforming Data into Insights 80% of data will be unstructured (IDC) Data-Driven Decision Making (credits: PwC) “Information is the currency of this digital age” Carly Fiorina, Former CEO of HP 2025 1 zettabyte = 1021 bytes 1 175 zettabytes of data globally (IDC) 85% of customer interaction will be without human interaction (Gartner) 2 3
  • 3. Use Cases Jim Kitterman. 2018. The Why behind the What. Banking (Fraud prediction & Recommendations) Human Resources (Automated HR) Marketing (Automated Customer service) Retail (Product Recommendations) Two of the leading drivers for AI adoption are delivering a better customer experience and helping employees to get better at their jobs (IDC, 2020) Leading AI Use Cases: automated customer service agents, recommendation, and automation
  • 4. Text Mining Landscape (Zhai, 2016) Real World Text Data Observed World (English)
  • 5. Formal Language (Chiang, 2018) Natural Language - Full of ambiguity - Use of contextual clues and other information Ambiguity - Nearly or completely unambiguous - Any statement has exactly one meaning, regardless of context - Verbose to reduce ambiguity - Redundant Redundancy- Concise - Less redundant - More than one meaning - Many idioms and metaphors Literalness- Exactly one meaning She spilled the beans http://www.idioms4you.com/complete-idioms/spill-the-beans.html https://www.quora.com/When-was-the-first-English-idiom-used-Why-was-it-used
  • 6. Dan Jurafsky. 2012. Slides – Introduction to NLP Natural Language Challenges
  • 7. Sarkar, D. 2018. Deep Learning Methods for Text Data – Word2Vec, GloVe, FastText. Towards Data Science Based on distributed representations (a dense representations of words in a low-dimensional vector space): Word2Vec, FastText Prediction-Based Models Word is associated with a continuous vector representation NLP Feature Extractions Count-based: TF, TF-IDF, N-grams Bag-of-Words Models
  • 8. NLP Landscape (Zhai, 2016) Real World Text Data Observed World AI Cognitive Application
  • 9. NLP Application – Recommender System 1 2 3 Improving with Use: Customer retention Improving Cart Value: Filter system (Amazon) Improving Engagement: Using subscriptions (YouTube) Corinna Underwood. 2020. Use Cases of Recommendation Systems.
  • 10. Recommendation System Types Collaborative Filtering Shortcoming: Cold Start Problem Content-Based Systems User-Based Users Similarity (Classification task) Item-Based Items Similarity based on Ratings (Pearson) Similarity between Features (Nearest Neighbor) User Likes and Feedback Rounak Banik. 2018. Hands-On Recommendation Systems with Python.
  • 11. NLP Content-Based Recommendation Job-recommendation System Armand Olivares. 2019. NLP Content-Based Recommendation Systems.Data: Kaggle - job-recommendation-datasets
  • 12. Job Description Preprocessing Data: Kaggle - job-recommendation-datasets Armand Olivares. 2019. NLP Content-Based Recommendation Systems. 1. Remove stop words 2. Remove not alphanumeric characters 3. Lemmatize the columns 4. Extract features (TF-IDF) 5. Use Cosine similarity (scores close to one = more similarity between items) Combined title, company, city, job type, description vector1 vector2 Euclidean Distance components of vectors
  • 13. What is Next? Career path recommendation Skill recommendation Course recommendation e-recruiting Graph-Based approach + NLP Job recommendation (Zhu et al., 2020)