SlideShare a Scribd company logo
Hyo Eun Lee
Network Science Lab
Dept. of Biotechnology
The Catholic University of Korea
E-mail: gydnsml@gmail.com
2023.08.07
KDD 2015
1
 Introduction
• Limitation of previous study
• Skip-gram
• BOW
• Paragraph Vector
 Related work
• Distributed Text Embedding
• Information Network Embedding
2
1. Introduction
Limitation of previous study
• When learning word representations, problems like word sparsity, multiplicity, and synonyms in a document
• Distributed representation, where similar words and documents are represented closely
in a low-dimensional space, is an effective solution to the above problem
• Skip-grams and Paragraph Vectors are representations based on unsupervised learning
• They are more effective than Brown clustering or nearest neighbors, using similarity to context words
3
1. Introduction
Skip-gram
• Represent words as high-dimensional vectors
using their relationships to neighboring words
• Computing the probability of a word occurring
around a center word
4
1. Introduction
BOW
• Measuring the frequency of words in a document
• Doesn't account for sentence order
5
1. Introduction
Fill in this black
Paragraph Vectors
• Utilize co-occurrence information between nearby
contextual words similar to skipgrams
• PV-DM
: Learns vectors for the entire sentence and
combines them with word vectors to capture both
context and meanings of the sentence
6
1. Introduction
Fill in this black
Paragraph Vectors
• Utilize co-occurrence information between nearby
context words similar to skip-grams
• PV-DBOW
: Use paragraph vectors to infer which words are
in which sentences
7
1. Introduction
Conclusion
• Embedding methods using unsupervised learning are commonly used for
classification, clustering, and ranking
• However,
compared to deep learning approaches, they have weaker predictive performance on certain tests
→ deep learning approaches include the labeling information in the data for embedding
• On the other hand, they are less computationally expensive than deep learning approaches
and do not require pre-training
• It also requires fewer parameters to be tuned
• This paper proposes PTE, a method that utilizes the advantages of
unsupervised learning-based embedding and label information
8
1. Introduction
PTE
• Using a heterogeneous text network to encode
word-word, word-document, and word-label co-occurrence information
• Learning low-dimensional embeddings in a semi-supervised method with heterogeneous text networks
• Learn distributed representations based on embedding information
9
2. Related work
Distributed Text Embedding
• Distributed representation can be categorized into unsupervised and supervised learning
• Unsupervised learning: embedding learning through common word combinations
, scalable to millions of documents
• Supervised learning: Generally, embedding learning based on neural networks
• Main difference is that unsupervised learning uses labels only for classifier training
, while supervised learning also uses labels for representation training
and learns them using pre-training if no labels are available
10
2. Related work
• PTE is a training algorithm that utilizes semi-supervised methods.
Distributed Text Embedding
• Supervised learning: Using label information ex. CNN and RNTN
RNTN
• Predict labels by embedding words or document as Vectors
• Sentence analyzed as a binary tree and represented as a Vector
using the same Tensor-based Composition Function for all nodes
• Compute the parent vector using a bottom-up method
11
1. Introduction
Information Network Embedding
• Using a heterogeneous text network to encode word-word, word-document, and word-label co-occurrence
information
• Learning low-dimensional embeddings in a semi-supervised method with heterogeneous text networks
• Learn distributed representations based on embedding information
12
2. Related work
• Unsupervised methods can only handle homogeneous networks
• Extend LINE to analyze heterogeneous networks (networks with multiple types of nodes and edges)
Information Network Embedding
• Training on a heterogeneous text network makes it relevant to network embedding problems
→ Useful in various areas such as node classification and link prediction ex. DeepWalk, LINE
DeepWalk
• Use a random walk
• only available on networks with binary edges
13
2. Related work
• Unsupervised methods can only handle homogeneous networks
• Extend LINE to analyze heterogeneous networks (networks with multiple types of nodes and edges)
Information Network Embedding
• Training on a heterogeneous text network makes it relevant to network embedding problems
→ Useful in various areas such as node classification and link prediction ex. DeepWalk, LINE
LINE
• Learn the embedding of a node
by considering 1st order proximity and
2nd order neighborhood similarity together

More Related Content

PPTX
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation l...
PPTX
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
PPTX
Semi supervised approach for word sense disambiguation
PPTX
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
PDF
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
PPTX
Hypertext Reading
PPTX
NLP Introduction and basics of natural language processing
PPTX
NS-CUK Seminar: J.H.Lee, Review on "Abstract Meaning Representation for Semb...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation l...
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
Semi supervised approach for word sense disambiguation
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
Hypertext Reading
NLP Introduction and basics of natural language processing
NS-CUK Seminar: J.H.Lee, Review on "Abstract Meaning Representation for Semb...

Similar to NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks", KDD 2015 (20)

PPT
Understanding Natural Language Queries over Relational Databases
PPTX
Building NLP solutions for Davidson ML Group
PPTX
Haystack 2019 - Search with Vectors - Simon Hughes
PPTX
Searching with vectors
PPTX
Vectors in Search - Towards More Semantic Matching
PPTX
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
PPTX
Survey of natural language processing(midp2)
PPTX
Natural Language Processing Advancements By Deep Learning: A Survey
PDF
Online assignment
PDF
Publishing in High Impact Journals-by Dr. Faizan Qamar
PPTX
Dbms classification according to data models
PDF
Towards Integrating Ontologies An EDM-Based Approach
PPTX
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
PDF
TopicModels_BleiPaper_Summary.pptx
PPTX
What is word2vec?
PPT
Understanding Natural Language Queries over Relational Databases
PDF
Improving Text Categorization with Semantic Knowledge in Wikipedia
PPTX
2010 INTERSPEECH
PPTX
NS-CUK Seminar: H.E.Lee, Review on "Structural Deep Embedding for Hyper-Net...
PDF
Networks and Natural Language Processing
Understanding Natural Language Queries over Relational Databases
Building NLP solutions for Davidson ML Group
Haystack 2019 - Search with Vectors - Simon Hughes
Searching with vectors
Vectors in Search - Towards More Semantic Matching
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Survey of natural language processing(midp2)
Natural Language Processing Advancements By Deep Learning: A Survey
Online assignment
Publishing in High Impact Journals-by Dr. Faizan Qamar
Dbms classification according to data models
Towards Integrating Ontologies An EDM-Based Approach
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
TopicModels_BleiPaper_Summary.pptx
What is word2vec?
Understanding Natural Language Queries over Relational Databases
Improving Text Categorization with Semantic Knowledge in Wikipedia
2010 INTERSPEECH
NS-CUK Seminar: H.E.Lee, Review on "Structural Deep Embedding for Hyper-Net...
Networks and Natural Language Processing

More from ssuser4b1f48 (20)

PPTX
NS-CUK Seminar: V.T.Hoang, Review on "GOAT: A Global Transformer on Large-sca...
PPTX
NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...
PPTX
NS-CUK Seminar: H.B.Kim, Review on "Cluster-GCN: An Efficient Algorithm for ...
PPTX
NS-CUK Seminar: H.E.Lee, Review on "Weisfeiler and Leman Go Neural: Higher-O...
PPTX
NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...
PPTX
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
PDF
Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)
PDF
Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...
PDF
Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°
PDF
Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)
PPTX
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
PPTX
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
PPTX
NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...
PPTX
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
PPTX
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...
PPTX
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
PPTX
NS-CUK Seminar: H.E.Lee, Review on "Graph Star Net for Generalized Multi-Tas...
PPTX
NS-CUK Seminar: V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...
PPTX
NS-CUK Seminar: H.E.Lee, Review on "Structural Deep Embedding for Hyper-Netw...
PPTX
NS-CUK Seminar: H.B.Kim, Review on "Deep Gaussian Embedding of Graphs: Unsup...
NS-CUK Seminar: V.T.Hoang, Review on "GOAT: A Global Transformer on Large-sca...
NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...
NS-CUK Seminar: H.B.Kim, Review on "Cluster-GCN: An Efficient Algorithm for ...
NS-CUK Seminar: H.E.Lee, Review on "Weisfeiler and Leman Go Neural: Higher-O...
NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)
Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...
Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°
Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
NS-CUK Seminar: H.E.Lee, Review on "Graph Star Net for Generalized Multi-Tas...
NS-CUK Seminar: V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...
NS-CUK Seminar: H.E.Lee, Review on "Structural Deep Embedding for Hyper-Netw...
NS-CUK Seminar: H.B.Kim, Review on "Deep Gaussian Embedding of Graphs: Unsup...

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Empathic Computing: Creating Shared Understanding
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Machine learning based COVID-19 study performance prediction
DOCX
The AUB Centre for AI in Media Proposal.docx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Weekly Chronicles - August'25 Week I
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
MYSQL Presentation for SQL database connectivity
sap open course for s4hana steps from ECC to s4
20250228 LYD VKU AI Blended-Learning.pptx
Approach and Philosophy of On baking technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Chapter 3 Spatial Domain Image Processing.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Review of recent advances in non-invasive hemoglobin estimation
Empathic Computing: Creating Shared Understanding
Building Integrated photovoltaic BIPV_UPV.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation_ Review paper, used for researhc scholars
Machine learning based COVID-19 study performance prediction
The AUB Centre for AI in Media Proposal.docx

NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks", KDD 2015

  • 1. Hyo Eun Lee Network Science Lab Dept. of Biotechnology The Catholic University of Korea E-mail: gydnsml@gmail.com 2023.08.07 KDD 2015
  • 2. 1  Introduction • Limitation of previous study • Skip-gram • BOW • Paragraph Vector  Related work • Distributed Text Embedding • Information Network Embedding
  • 3. 2 1. Introduction Limitation of previous study • When learning word representations, problems like word sparsity, multiplicity, and synonyms in a document • Distributed representation, where similar words and documents are represented closely in a low-dimensional space, is an effective solution to the above problem • Skip-grams and Paragraph Vectors are representations based on unsupervised learning • They are more effective than Brown clustering or nearest neighbors, using similarity to context words
  • 4. 3 1. Introduction Skip-gram • Represent words as high-dimensional vectors using their relationships to neighboring words • Computing the probability of a word occurring around a center word
  • 5. 4 1. Introduction BOW • Measuring the frequency of words in a document • Doesn't account for sentence order
  • 6. 5 1. Introduction Fill in this black Paragraph Vectors • Utilize co-occurrence information between nearby contextual words similar to skipgrams • PV-DM : Learns vectors for the entire sentence and combines them with word vectors to capture both context and meanings of the sentence
  • 7. 6 1. Introduction Fill in this black Paragraph Vectors • Utilize co-occurrence information between nearby context words similar to skip-grams • PV-DBOW : Use paragraph vectors to infer which words are in which sentences
  • 8. 7 1. Introduction Conclusion • Embedding methods using unsupervised learning are commonly used for classification, clustering, and ranking • However, compared to deep learning approaches, they have weaker predictive performance on certain tests → deep learning approaches include the labeling information in the data for embedding • On the other hand, they are less computationally expensive than deep learning approaches and do not require pre-training • It also requires fewer parameters to be tuned • This paper proposes PTE, a method that utilizes the advantages of unsupervised learning-based embedding and label information
  • 9. 8 1. Introduction PTE • Using a heterogeneous text network to encode word-word, word-document, and word-label co-occurrence information • Learning low-dimensional embeddings in a semi-supervised method with heterogeneous text networks • Learn distributed representations based on embedding information
  • 10. 9 2. Related work Distributed Text Embedding • Distributed representation can be categorized into unsupervised and supervised learning • Unsupervised learning: embedding learning through common word combinations , scalable to millions of documents • Supervised learning: Generally, embedding learning based on neural networks • Main difference is that unsupervised learning uses labels only for classifier training , while supervised learning also uses labels for representation training and learns them using pre-training if no labels are available
  • 11. 10 2. Related work • PTE is a training algorithm that utilizes semi-supervised methods. Distributed Text Embedding • Supervised learning: Using label information ex. CNN and RNTN RNTN • Predict labels by embedding words or document as Vectors • Sentence analyzed as a binary tree and represented as a Vector using the same Tensor-based Composition Function for all nodes • Compute the parent vector using a bottom-up method
  • 12. 11 1. Introduction Information Network Embedding • Using a heterogeneous text network to encode word-word, word-document, and word-label co-occurrence information • Learning low-dimensional embeddings in a semi-supervised method with heterogeneous text networks • Learn distributed representations based on embedding information
  • 13. 12 2. Related work • Unsupervised methods can only handle homogeneous networks • Extend LINE to analyze heterogeneous networks (networks with multiple types of nodes and edges) Information Network Embedding • Training on a heterogeneous text network makes it relevant to network embedding problems → Useful in various areas such as node classification and link prediction ex. DeepWalk, LINE DeepWalk • Use a random walk • only available on networks with binary edges
  • 14. 13 2. Related work • Unsupervised methods can only handle homogeneous networks • Extend LINE to analyze heterogeneous networks (networks with multiple types of nodes and edges) Information Network Embedding • Training on a heterogeneous text network makes it relevant to network embedding problems → Useful in various areas such as node classification and link prediction ex. DeepWalk, LINE LINE • Learn the embedding of a node by considering 1st order proximity and 2nd order neighborhood similarity together