SlideShare a Scribd company logo
Power of Visualizing Embeddings
The Power of Visualizing
Embeddings
Pramod Singh
About Me ..
▪ Team Lead – Data Science
Bain and Company
▪ Speaker
-O’Reilly Strata conference
-GIDS
▪ Published Author
• Machine Learning using PySpark
• Learn PySpark
• Learn TensorFlow 2.0 : The easy way
• Machine Learning in Production ( WIP)
▪ https://www.linkedin.com/in/pramodchahar/
Agenda
▪ Inspiration for this session
▪ Conventional Approach
▪ Learning Embeddings
▪ Custom Embeddings
▪ Visualizing Embeddings
▪ FAQs
Power of Visualizing Embeddings
Interactions
Finance Exteriors Interiors Maintenance Car DealerFeatures
User Journey – Core Elements
Different Pages Categories Time Spent Sequence of
Events
User Representation – I
User ID Total Visits Total Time Spent Total Pages Total Sessions Converted
121A 10 25 110 4 0
User Representation – II
User ID Total Pages Finance Specification Dealer … … Finance(sec) Specification(sec) Dealer(sec) … Converted
121A 10 3 4 4 3 5 5 0
All Users
User ID Total Pages Finance Specification Dealer … … Finance(sec) Specification(sec) Dealer(sec) … Converted
121A 10 3 4 4 3 5 5 0
19X2 50 0 21 0 0 350 0 0
GG52 33 8 4 9 45 50 78 1
Applicable to other domains
Finance & Insurance E-Commerce/Retail Real Estate
Key Questions
▪ Which set of customer journeys are similar
to each other ?
▪ Which set of customer journeys indicate
broken vs seamless experience ?
▪ Which are those 4-5 major routes that
customers takes in order to convert ?
Category Representation
Frequency Based Prediction BasedOne Hot Encoding
Challenges
High-Cardinality Variables Sematic Signal w/o Supervision
Challenges
Specifications
Number of columns = Number of unique categories
Price
Features
Specifications Price Features Reviews .. …
1 0 0 0 0 0
0 1 0 0 0 0
0 0 1 0 0 0
Similarity between Specifications and Price = 0
Similarity between Price and Features = 0
Similarity between Features and Specifications = 0
Gaps
• Sequence of events is ignored
Can we represent each of these page categories with a vector which captures the underlying semantics ?
Using this vector , can we represent each user journey?
Embeddings
Embeddings
“An embedding is a mapping of a discrete — categorical — variable to a vector of continuous numbers such that the vectors of similar entities are
closer to one another in vector space.”
king  -  man + woman = queen
Category Similarity using Embeddings
Price 0.43 0.75 0.98 … …. … 0.55 0.87
Specification 0.23 0.10 0.33 … …. … 0.45 0.20
Features 0.22 0.09 0.30 … …. … 0.44 0.18
Similarity between Specifications and Price = - 0.75
Similarity between Price and Features = - 0.83
Similarity between Features and Specifications = 0.91
Immediate Advantages
Fix Size Representation Similar Categories
Embeddings
Image Text Music User
Learning Embeddings
Without Label With Label Pre-Trained
— John R. Firth (a dominant figure in 20th century Linguistics)
“You shall know a word by the
company it keeps.”
Without Label
Sequence Based Embedding
The earth is round and moves around the sun
“All we need is a sequence of categories”
Sequence Based Embedding*
The earth is round and moves around the sun
• Context and Target Words
• Given a word, which are the neighboring words ?
• Given the neighboring words, what's the target word ?
*Window size
CBOW Model
The
Earth
round
and
is
Neural Network
Target
Skip-Gram Model
The
Earth
round
and
is
Word2Vec
Homepage Offers Finance Offers Specification … … Test Drive
With Label
Embedding layer in DNN
Homepage
Offers
Finance
Specifications
…
…
…
…
CrossEntropy
Loss
Observed
target
Predicted
target
Embedding
Layer
Custom Embeddings
Category Embeddings
Page-Category Embedding
Brochure
Reviews
Finance
Test Drive
Specification
0.13 0.45 .. 0.21 0.67
Column Length : Embedding Size : 100
0.25 0.23 .. 0.53 0.98
0.98 0.12 .. 0.34 0.76
0.21 0.53 .. 0.23 0.87
0.87 0.24 .. 0.63 0.25
Embedding Visualization
Categories related to services,
warranty, review are closer
Categories related to
test drive activities are
closer
Categories vehicle
information are closer
User Journey Mapping
Page ‘A’ Page ‘B’ Page ‘D’Page ‘C’Visitor 1 Page ‘E’
Page ‘B’ Page ‘D’ Page ‘E’Visitor 2
0.43 0.75 0.98 0.55 0.87
0.54 0.23 0.56 0.35 0.76
Customize Embeddings
Page-Category Embedding
Brochure
Reviews
Finance
Test Drive
Specification
0.13 0.45 .. 0.21 0.67
Column Length : Embedding Size : 100
0.25 0.23 .. 0.53 0.98
0.98 0.12 .. 0.34 0.76
0.21 0.53 .. 0.23 0.87
0.87 0.24 .. 0.63 0.25
User Journey
0.43 0.75 0.98 0.55 0.87
Brochure Specification Finance Reviews Test Drive
Time Spent
Brochure 0.13 0.45 .. 0.21 0.67 0.43
Time Spent
Specification 0.25 0.23 .. 0.53 0.98 0.75
… … …
… … …
Test Drive 0.87 0.24 .. 0.63 0.25 0.87
User Journey Embedding 0.53 0.76 0.35 0.65 0.89
Customer Journey Visualization*
*Dummy Data
Tensorflow Projector
Advantages of Embeddings
• Finding nearest neighbours in the low dimensional space
• Input features for machine learning prediction
• For understanding relations between between categories
Additional Resources
• https://towardsdatascience.com/neural-network-embeddings-explained-4d028e6f0526
• http://jalammar.github.io/illustrated-transformer/
• https://www.youtube.com/results?search_query=sequence+embeddings+pramod
Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.
Power of Visualizing Embeddings

More Related Content

PDF
User Behavior Hashing for Audience Expansion
PDF
You Can Do It in SQL
PDF
Javantura v3 - Microservice – no fluff the REAL stuff – Nakul Mishra
DOCX
Net App Architect/Systems Admin
PPTX
Designing microservices part2
PPTX
Embeddings for Recommendation Systems
PDF
apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthe...
PDF
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
User Behavior Hashing for Audience Expansion
You Can Do It in SQL
Javantura v3 - Microservice – no fluff the REAL stuff – Nakul Mishra
Net App Architect/Systems Admin
Designing microservices part2
Embeddings for Recommendation Systems
apidays Paris 2024 - Embeddings: Core Concepts for Developers, Jocelyn Matthe...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...

Similar to Power of Visualizing Embeddings (20)

PPTX
Embeddings for recommendation systems
PDF
What is Embedding in Machine Learning.pdf
PDF
solulab.com-What are Embedding in Machine Learning.pdf
PDF
leewayhertz.com-What role do embeddings play in a ChatGPT-like model.pdf
PDF
Word Embeddings - Introduction
PDF
What are Embedding in Machine Learning.pdf
PDF
Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...
PDF
Word embeddings as a service - PyData NYC 2015
PDF
David Barber - Deep Nets, Bayes and the story of AI
PDF
Interactive Analysis of Word Vector Embeddings
PDF
word embeddings and applications to machine translation and sentiment analysis
PPTX
Neural Models for Information Retrieval
PPTX
Interpreting Embeddings with Comparison
PDF
Presentation
PDF
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
PPTX
Interpreting Embeddings with Comparison
PPTX
Matrix factorization: Demistifying word embeddings
PPTX
The Neural Search Frontier - Doug Turnbull, OpenSource Connections
PDF
M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...
PPTX
Vectorland: Brief Notes from Using Text Embeddings for Search
Embeddings for recommendation systems
What is Embedding in Machine Learning.pdf
solulab.com-What are Embedding in Machine Learning.pdf
leewayhertz.com-What role do embeddings play in a ChatGPT-like model.pdf
Word Embeddings - Introduction
What are Embedding in Machine Learning.pdf
Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...
Word embeddings as a service - PyData NYC 2015
David Barber - Deep Nets, Bayes and the story of AI
Interactive Analysis of Word Vector Embeddings
word embeddings and applications to machine translation and sentiment analysis
Neural Models for Information Retrieval
Interpreting Embeddings with Comparison
Presentation
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
Interpreting Embeddings with Comparison
Matrix factorization: Demistifying word embeddings
The Neural Search Frontier - Doug Turnbull, OpenSource Connections
M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...
Vectorland: Brief Notes from Using Text Embeddings for Search
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake
Ad

Recently uploaded (20)

PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Introduction to the R Programming Language
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Lecture1 pattern recognition............
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Business Analytics and business intelligence.pdf
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to the R Programming Language
ISS -ESG Data flows What is ESG and HowHow
climate analysis of Dhaka ,Banglades.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Reliability_Chapter_ presentation 1221.5784
Optimise Shopper Experiences with a Strong Data Estate.pdf
Lecture1 pattern recognition............
oil_refinery_comprehensive_20250804084928 (1).pptx
Mega Projects Data Mega Projects Data
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Business Analytics and business intelligence.pdf
Qualitative Qantitative and Mixed Methods.pptx
Clinical guidelines as a resource for EBP(1).pdf
SAP 2 completion done . PRESENTATION.pptx
IB Computer Science - Internal Assessment.pptx

Power of Visualizing Embeddings