The power of unstructured data: Recommendation systems

The Power of Unstructured Data
Olga Scrivner, PhD
Research Scientist, CNS, Indiana University
Visiting Lecturer, Data Science Program, Indiana University
Corporate Faculty, Data Analytics, Harrisburg University of Science & Technology
Recommendation Systems

Transforming Data into Insights
80% of data will be unstructured
(IDC)
Data-Driven Decision Making (credits: PwC)
“Information is the
currency of this
digital age”
Carly Fiorina, Former CEO
of HP
2025
1 zettabyte = 1021 bytes
1 175 zettabytes of data globally
(IDC)
85% of customer interaction will
be without human interaction
(Gartner)
2
3

Use Cases
Jim Kitterman. 2018. The Why behind the What.
Banking (Fraud prediction
& Recommendations)
Human Resources
(Automated HR)
Marketing (Automated
Customer service)
Retail (Product
Recommendations)
Two of the leading drivers for AI adoption are delivering a
better customer experience and helping employees to get
better at their jobs (IDC, 2020)
Leading AI Use Cases: automated customer service agents, recommendation, and automation

Text Mining Landscape
(Zhai, 2016)
Real World Text Data
Observed World
(English)

Formal Language
(Chiang, 2018)
Natural Language
- Full of ambiguity
- Use of contextual
clues and other
information
Ambiguity
- Nearly or completely
unambiguous
- Any statement has exactly
one meaning, regardless of
context
- Verbose to reduce
ambiguity
- Redundant
Redundancy- Concise
- Less redundant
- More than one
meaning
- Many idioms and
metaphors
Literalness- Exactly one meaning
She spilled the beans
http://www.idioms4you.com/complete-idioms/spill-the-beans.html
https://www.quora.com/When-was-the-first-English-idiom-used-Why-was-it-used

Dan Jurafsky. 2012. Slides – Introduction to NLP
Natural Language Challenges

Sarkar, D. 2018. Deep Learning Methods for Text Data – Word2Vec, GloVe, FastText. Towards Data Science
Based on distributed representations (a dense
representations of words in a low-dimensional vector
space): Word2Vec, FastText
Prediction-Based
Models
Word is associated with a
continuous vector
representation
NLP Feature Extractions
Count-based: TF, TF-IDF, N-grams
Bag-of-Words
Models

NLP Landscape
(Zhai, 2016)
Real World Text Data
Observed World
AI Cognitive Application

NLP Application – Recommender System
1
2
3
Improving with Use: Customer retention
Improving Cart Value: Filter system (Amazon)
Improving Engagement: Using subscriptions (YouTube)
Corinna Underwood. 2020. Use Cases of Recommendation Systems.

Recommendation System Types
Collaborative
Filtering
Shortcoming: Cold Start Problem
Content-Based
Systems
User-Based
Users Similarity
(Classification task)
Item-Based
Items Similarity based on
Ratings (Pearson)
Similarity between Features
(Nearest Neighbor)
User Likes and Feedback
Rounak Banik. 2018. Hands-On Recommendation Systems with Python.

NLP Content-Based Recommendation
Job-recommendation System
Armand Olivares. 2019. NLP Content-Based Recommendation Systems.Data: Kaggle - job-recommendation-datasets

Job Description Preprocessing
Data: Kaggle - job-recommendation-datasets
Armand Olivares. 2019. NLP Content-Based Recommendation Systems.
1. Remove stop words
2. Remove not alphanumeric characters
3. Lemmatize the columns
4. Extract features (TF-IDF)
5. Use Cosine similarity (scores close to
one = more similarity between items)
Combined title, company, city, job type, description
vector1 vector2
Euclidean Distance
components of vectors

What is Next?
Career path recommendation
Skill recommendation
Course recommendation
e-recruiting
Graph-Based approach + NLP
Job recommendation
(Zhu et al., 2020)

The power of unstructured data: Recommendation systems

More Related Content

What's hot (18)

Similar to The power of unstructured data: Recommendation systems (20)

More from Olga Scrivner (20)

Recently uploaded (20)

The power of unstructured data: Recommendation systems