SlideShare a Scribd company logo
Summer 2017
Elvis Saravia
PhD, Information Systems and Applications
ellfae@gmail.com
Github username: omarsar
Questions: sli.do (#Z217)
2
●
●
●
●
●
●
● Knowledge Discovery (KDD) Process
3
4
5
ConceptNet
6
●
●
●
7
Motel = [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
Hotel = [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
●
●
One-hot representation
8
hotel = [0.728 0.234 -0.23 0.223]
Distributed representation (low-dimension vector)
9
10
Paper source: https://arxiv.org/pdf/1301.3781.pdf
11
Paper source: https://arxiv.org/pdf/1301.3781.pdf
Feedforward Neural Net Language Model (NNLM)
variables to optimize
denotes window range
12
13
P(the|over)
P(fox|over)
P(jumped|over)
P(the|over)
P(lazy|over)
P(dog|over)
P(VOUT
| VIN
)
How to define this prob. distribution?
Determines similarity in [-1,1]
Get a probability in [0,1] out of a similarity in [-1,1]
14
15
https://www.healthvault.com/en-us/health-bo
t/
16
● https://goo.gl/ppHX65
●
○ Gensim guide for word2vec: https://goo.gl/i2UrdH
● https://goo.gl/7b72S9
●
● https://goo.gl/uNJDrs
●
17
18
19
20
21
22
23
● https://goo.gl/KYacjz
●
●
●
●
●
● https://goo.gl/JezgYg
●
24
a. Build API: (Flask/Django recommended)
b. Pretrained models: (Guide: https://goo.gl/5qt2Ki)
c. Visualization: d3js / plotly / tensorboard
a. LSTM - (Guide: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
b. CNN - (Guide: https://goo.gl/PgLUs7)
c. RNN - (Guide: https://goo.gl/5L9kci
a. Starting point:https://rare-technologies.com/word2vec-tutorial#app
25

More Related Content

PDF
Practical Collapsed Stochastic Variational Inference
PDF
[論文紹介] Towards Understanding Linear Word Analogies
PDF
Tips And Tricks For Bioinformatics Software Engineering
PPTX
(Semantic Web Technologies and Applications track) "MIRROR: Automatic R2RML M...
PDF
DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser Bootsma
PDF
Applying your Convolutional Neural Networks
PDF
PGQL: A Language for Graphs
PPTX
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...
Practical Collapsed Stochastic Variational Inference
[論文紹介] Towards Understanding Linear Word Analogies
Tips And Tricks For Bioinformatics Software Engineering
(Semantic Web Technologies and Applications track) "MIRROR: Automatic R2RML M...
DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser Bootsma
Applying your Convolutional Neural Networks
PGQL: A Language for Graphs
" How to survive with 1 billion vectors and not sell a kidney: our low-cost c...

Similar to Text mining lab (summer 2017) - Word Vector Representation (20)

PDF
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart-data
PPTX
IA3_presentation.pptx
PDF
Introduction to Graph Databases @ SAI
PPTX
Enabling semantic integration
PPTX
Demanding the Impossible: Rigorous Database Benchmarking
ODP
Querying your database in natural language by Daniel Moisset PyData SV 2014
ODP
Quepy
PDF
Probabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
PDF
How to easily find the optimal solution without exhaustive search using Genet...
PPTX
sos4R @ OGC TC
PDF
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
PPTX
Introduction to Julia
PDF
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
PPTX
India software developers conference 2013 Bangalore
PPTX
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
KEY
Let's build a parser!
PDF
CS571: Distributional semantics
PPTX
VoxxedDays Luxembourg 2019
ODP
Concepts of JetBrains MPS
PDF
Deep Dive on Deep Learning (June 2018)
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart-data
IA3_presentation.pptx
Introduction to Graph Databases @ SAI
Enabling semantic integration
Demanding the Impossible: Rigorous Database Benchmarking
Querying your database in natural language by Daniel Moisset PyData SV 2014
Quepy
Probabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
How to easily find the optimal solution without exhaustive search using Genet...
sos4R @ OGC TC
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
Introduction to Julia
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
India software developers conference 2013 Bangalore
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
Let's build a parser!
CS571: Distributional semantics
VoxxedDays Luxembourg 2019
Concepts of JetBrains MPS
Deep Dive on Deep Learning (June 2018)
Ad

More from Elvis Saravia (9)

PDF
The Future of Brain-Powered Learning
PDF
Introduction to Fundamentals of RNNs
PDF
Thesis oral defense 2015 elvis saravia
PDF
An Introduction to Apache Spark
PPTX
The Neurochemistry of Music
PDF
NewSQL - The Future of Databases?
PDF
Crowdsource Delivery System - Improving traditional delivery systems
PDF
Relational Databases - Benefits and Challenges
PDF
Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental D...
The Future of Brain-Powered Learning
Introduction to Fundamentals of RNNs
Thesis oral defense 2015 elvis saravia
An Introduction to Apache Spark
The Neurochemistry of Music
NewSQL - The Future of Databases?
Crowdsource Delivery System - Improving traditional delivery systems
Relational Databases - Benefits and Challenges
Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental D...
Ad

Recently uploaded (20)

PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
1_Introduction to advance data techniques.pptx
PDF
Mega Projects Data Mega Projects Data
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Computer network topology notes for revision
PPTX
Introduction to machine learning and Linear Models
PDF
Lecture1 pattern recognition............
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Qualitative Qantitative and Mixed Methods.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Business Analytics and business intelligence.pdf
Data_Analytics_and_PowerBI_Presentation.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
1_Introduction to advance data techniques.pptx
Mega Projects Data Mega Projects Data
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Computer network topology notes for revision
Introduction to machine learning and Linear Models
Lecture1 pattern recognition............
Introduction to Knowledge Engineering Part 1
IBA_Chapter_11_Slides_Final_Accessible.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj

Text mining lab (summer 2017) - Word Vector Representation