SlideShare a Scribd company logo
Understanding Large Social
Networks
Information Retrieval and Extraction (Major Project)
Team Members (Team No.: 57)
Madan Jhawar 201202018
Raj Patel 201301048
Kishan Vadaliya 201505621
Mentor
Ganesh Jawahar
Problem
Statement
To build a model that can capture the
network information of a node in an
efficient and scalable manner by
representing every node into low-
dimensional vector spaces.
Motivation
● Representing network nodes into low-dimensional vector spaces helps us in many
tasks such as visualization, node classification, and link prediction, etc., using
standard machine learning algorithms on top of that.
● The size of most real-world information networks such as YouTube, Flickr,
Facebook, etc., ranges from hundreds of nodes to millions and billions of nodes.
Scalable solutions to understand such networks are very useful in the industry.
● Deep Learning has worked wonders in Language Modelling and has provided
commendable results in node representation tasks as well.
Dataset used
● Blogcatalog dataset is used for this project.
● It contains 10312 nodes, 333983 links.
● These nodes and links represent users and their friendship respectively.
● Dataset - http://socialcomputing.asu.edu/datasets/BlogCatalog3
Baseline Paper
LINE: LARGE-SCALE INFORMATION
NETWORK EMBEDDING
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, Qiaozhu Mei
LINE: First-Order Proximity
● First-order proximity refers to the local pairwise proximity between the vertices in
the network (i.e Only the direct neighbours).
● To model the first-order proximity between each undirected edge (i, j), we define
‘joint probability’ between vertex vi
and vj
,
where the empirical probability is given by:
● Minimize the KL-divergence of two probability distributions, given by
LINE: Second-Order Proximity
● The second-order proximity assumes that vertices sharing many connections
to other vertices are similar to each other.
● In this model - Vertex plays two roles: the vertex itself( ) and a specific
“context” ( ) of other vertices.
● For each directed edge (i, j), we first define the probability of “context” vj
generated by vertex vi
as:
,
-
where the empirical probability is given by:
● We take KL-divergence as the distance function:
LINE: Negative Sampling (Optimization)
● If we carefully analyze the denominator, it is quite expensive operation as it
considers dot-product with every other node, where dot-product itself takes linear
(depends on size of the vector) time.
● To optimize we use Negative Sampling, which samples multiple negative edges
according to some noisy distribution for each edge (i, j)
Novelty
● The authors train the neural networks for the First-Order Proximity and the
Second-Order Proximity independently, and then concatenate the embeddings
learned from both the models for a node to get the final embeddings of a node.
● In our project, we use the same lookup table for the node representation in both
the neural networks, i.e., the node embeddings learned from the First-Order
Proximity are used in the neural network for the Second-Order Proximity, instead
of initializing them again to random values.
● The embeddings learned from the Second-Order Proximity deep-net are, then,
taken as the final embeddings of a node.
Languages used
● The complete code is written in
Lua language.
● For building Neural Networks,
Torch, a framework written in
Lua, is used.
● We use the scikit-learn library in
Python to test the accuracy of
our model.
Experimentation and Results
“scoring.py” script is not working. Will be updated
soon.
Week 1
Getting started with NN
and Deep Learning
Week 2
Formalizing the problem
and getting acquainted
with the baseline paper
Week 3
Implementing first-order
proximity in Torch.
Week 4
Implementing second-order
proximity in Torch.
Week 5
Combining first-order
and second-order along
with extending the work
Project Timeline
Challenges Faced
● Understanding the mathematics of the baseline paper.
● Referring to other research articles related to the baseline paper.
● Getting around with Lua and Torch.
● Not everything could be printed and debugged while coding neural nets in Lua.
● The dataset is very large, so we had to make several adjustments with the data
structures used in our code to reduce the time complexity.
References
● http://www.www2015.it/documents/proceedings/proceedings/p1067.pdf
● http://arxiv.org/pdf/1403.6652.pdf
● http://cs224d.stanford.edu/lecture_notes/LectureNotes1.pdf
● https://github.com/torch/nn
● https://github.com/gjwhr/dl4nlp-made-easy/blob/master/word2vec/cbow.lua
● https://github.com/gjwhr/dl4nlp-made-easy/blob/master/word2vec/skipgram.lua
Other links
https:/github.com/raj454raj/Graph2vec/
http://raj454raj.github.io/Graph2vec/
https://youtu.be/2Ifxt9nlzvk
Thank you!

More Related Content

PPTX
PDF
Ire presentation
PDF
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
PPTX
RNN & LSTM: Neural Network for Sequential Data
PDF
RNN and its applications
PDF
Recent Progress in RNN and NLP
PDF
Intro to Deep Learning
PDF
Overview of Convolutional Neural Networks
Ire presentation
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
RNN & LSTM: Neural Network for Sequential Data
RNN and its applications
Recent Progress in RNN and NLP
Intro to Deep Learning
Overview of Convolutional Neural Networks

What's hot (20)

PDF
Learning Communication with Neural Networks
PPT
Nural network ER. Abhishek k. upadhyay
PPTX
Efficient Neural Network Architecture for Image Classfication
PDF
PDF
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
PPT
Neural networks1
DOCX
K means report
PDF
Web spam classification using supervised artificial neural network algorithms
PDF
Scale free network Visualiuzation
PPTX
Handwritten Digit Recognition and performance of various modelsation[autosaved]
PDF
Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...
ODP
Distributed Coordination
PDF
Long Short Term Memory
PDF
Neural Networks
PPTX
PDF
Lecture 7: Recurrent Neural Networks
PPTX
Deep Learning - RNN and CNN
PPTX
Neural network
PPTX
TypeScript and Deep Learning
PPTX
Feedforward neural network
Learning Communication with Neural Networks
Nural network ER. Abhishek k. upadhyay
Efficient Neural Network Architecture for Image Classfication
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
Neural networks1
K means report
Web spam classification using supervised artificial neural network algorithms
Scale free network Visualiuzation
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...
Distributed Coordination
Long Short Term Memory
Neural Networks
Lecture 7: Recurrent Neural Networks
Deep Learning - RNN and CNN
Neural network
TypeScript and Deep Learning
Feedforward neural network
Ad

Viewers also liked (18)

PPTX
Slovenia 2009
PPT
Be Careful What You Wish For
 
PPTX
Quizzitch 2014 - Finals
PDF
Cutre comic
PDF
Во внутренности Kivy
PPTX
Что почитать вместе с дочерью.Pptx
PDF
Сплайн интерполяция
PDF
DF1 - Py - Ovcharenko - Theano Tutorial
PDF
The headless CMS
PDF
Динамика твёрдого тела: случай Лагранжа
PPT
гибридная книга
PDF
Основы Python. Функции
PDF
To infinity and Beyond with Plone 5!
ODP
C++ для встраиваемых систем
PDF
Presentazione Open Day del Politecnico di Milano (2016)
PPTX
Dbda勉強会~概要説明ochi20130803
PDF
Introduzione alla realizzazione di videogiochi - Game Engine
PDF
Классификация сигналов головного мозга для нейрокомпьютерного интерфейса
Slovenia 2009
Be Careful What You Wish For
 
Quizzitch 2014 - Finals
Cutre comic
Во внутренности Kivy
Что почитать вместе с дочерью.Pptx
Сплайн интерполяция
DF1 - Py - Ovcharenko - Theano Tutorial
The headless CMS
Динамика твёрдого тела: случай Лагранжа
гибридная книга
Основы Python. Функции
To infinity and Beyond with Plone 5!
C++ для встраиваемых систем
Presentazione Open Day del Politecnico di Milano (2016)
Dbda勉強会~概要説明ochi20130803
Introduzione alla realizzazione di videogiochi - Game Engine
Классификация сигналов головного мозга для нейрокомпьютерного интерфейса
Ad

Similar to Understanding Large Social Networks | IRE Major Project | Team 57 (20)

PDF
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
PPTX
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
PPTX
NS-CUK Seminar:H.B.Kim, Review on "Asymmetric transitivity preserving graph ...
PPTX
[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptx
PPTX
240408_JW_labseminar[Asymmetric Transitivity Preserving Graph Embedding].pptx
PPTX
LINE: Large-scale Information Network Embedding.pptx
PPTX
240506_JW_labseminar[Structural Deep Network Embedding].pptx
PDF
kdd_talk.pdf
PDF
kdd_talk.pdf
PPTX
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
PPTX
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
PDF
KDD17Tutorial_final (1).pdf
PPTX
NS-CUK Seminar: H.E.Lee, Review on "Structural Deep Embedding for Hyper-Netw...
PPTX
Colloquium.pptx
PPTX
NS-CUK Seminar: H.E.Lee, Review on "Structural Deep Embedding for Hyper-Net...
PPTX
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation l...
PDF
Gnn overview
PDF
Rnn presentation 2
PDF
Recurrent Neural Networks
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
NS-CUK Seminar:H.B.Kim, Review on "Asymmetric transitivity preserving graph ...
[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptx
240408_JW_labseminar[Asymmetric Transitivity Preserving Graph Embedding].pptx
LINE: Large-scale Information Network Embedding.pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx
kdd_talk.pdf
kdd_talk.pdf
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
KDD17Tutorial_final (1).pdf
NS-CUK Seminar: H.E.Lee, Review on "Structural Deep Embedding for Hyper-Netw...
Colloquium.pptx
NS-CUK Seminar: H.E.Lee, Review on "Structural Deep Embedding for Hyper-Net...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation l...
Gnn overview
Rnn presentation 2
Recurrent Neural Networks

Recently uploaded (20)

PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PDF
Indian roads congress 037 - 2012 Flexible pavement
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
Computing-Curriculum for Schools in Ghana
PDF
Hazard Identification & Risk Assessment .pdf
PDF
HVAC Specification 2024 according to central public works department
PPTX
Introduction to pro and eukaryotes and differences.pptx
PPTX
Introduction to Building Materials
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
Empowerment Technology for Senior High School Guide
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
B.Sc. DS Unit 2 Software Engineering.pptx
Indian roads congress 037 - 2012 Flexible pavement
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
Practical Manual AGRO-233 Principles and Practices of Natural Farming
202450812 BayCHI UCSC-SV 20250812 v17.pptx
FORM 1 BIOLOGY MIND MAPS and their schemes
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Computing-Curriculum for Schools in Ghana
Hazard Identification & Risk Assessment .pdf
HVAC Specification 2024 according to central public works department
Introduction to pro and eukaryotes and differences.pptx
Introduction to Building Materials
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Empowerment Technology for Senior High School Guide
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...

Understanding Large Social Networks | IRE Major Project | Team 57

  • 1. Understanding Large Social Networks Information Retrieval and Extraction (Major Project) Team Members (Team No.: 57) Madan Jhawar 201202018 Raj Patel 201301048 Kishan Vadaliya 201505621 Mentor Ganesh Jawahar
  • 2. Problem Statement To build a model that can capture the network information of a node in an efficient and scalable manner by representing every node into low- dimensional vector spaces.
  • 3. Motivation ● Representing network nodes into low-dimensional vector spaces helps us in many tasks such as visualization, node classification, and link prediction, etc., using standard machine learning algorithms on top of that. ● The size of most real-world information networks such as YouTube, Flickr, Facebook, etc., ranges from hundreds of nodes to millions and billions of nodes. Scalable solutions to understand such networks are very useful in the industry. ● Deep Learning has worked wonders in Language Modelling and has provided commendable results in node representation tasks as well.
  • 4. Dataset used ● Blogcatalog dataset is used for this project. ● It contains 10312 nodes, 333983 links. ● These nodes and links represent users and their friendship respectively. ● Dataset - http://socialcomputing.asu.edu/datasets/BlogCatalog3
  • 5. Baseline Paper LINE: LARGE-SCALE INFORMATION NETWORK EMBEDDING Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, Qiaozhu Mei
  • 6. LINE: First-Order Proximity ● First-order proximity refers to the local pairwise proximity between the vertices in the network (i.e Only the direct neighbours). ● To model the first-order proximity between each undirected edge (i, j), we define ‘joint probability’ between vertex vi and vj , where the empirical probability is given by: ● Minimize the KL-divergence of two probability distributions, given by
  • 7. LINE: Second-Order Proximity ● The second-order proximity assumes that vertices sharing many connections to other vertices are similar to each other. ● In this model - Vertex plays two roles: the vertex itself( ) and a specific “context” ( ) of other vertices. ● For each directed edge (i, j), we first define the probability of “context” vj generated by vertex vi as: , - where the empirical probability is given by: ● We take KL-divergence as the distance function:
  • 8. LINE: Negative Sampling (Optimization) ● If we carefully analyze the denominator, it is quite expensive operation as it considers dot-product with every other node, where dot-product itself takes linear (depends on size of the vector) time. ● To optimize we use Negative Sampling, which samples multiple negative edges according to some noisy distribution for each edge (i, j)
  • 9. Novelty ● The authors train the neural networks for the First-Order Proximity and the Second-Order Proximity independently, and then concatenate the embeddings learned from both the models for a node to get the final embeddings of a node. ● In our project, we use the same lookup table for the node representation in both the neural networks, i.e., the node embeddings learned from the First-Order Proximity are used in the neural network for the Second-Order Proximity, instead of initializing them again to random values. ● The embeddings learned from the Second-Order Proximity deep-net are, then, taken as the final embeddings of a node.
  • 10. Languages used ● The complete code is written in Lua language. ● For building Neural Networks, Torch, a framework written in Lua, is used. ● We use the scikit-learn library in Python to test the accuracy of our model.
  • 11. Experimentation and Results “scoring.py” script is not working. Will be updated soon.
  • 12. Week 1 Getting started with NN and Deep Learning Week 2 Formalizing the problem and getting acquainted with the baseline paper Week 3 Implementing first-order proximity in Torch. Week 4 Implementing second-order proximity in Torch. Week 5 Combining first-order and second-order along with extending the work Project Timeline
  • 13. Challenges Faced ● Understanding the mathematics of the baseline paper. ● Referring to other research articles related to the baseline paper. ● Getting around with Lua and Torch. ● Not everything could be printed and debugged while coding neural nets in Lua. ● The dataset is very large, so we had to make several adjustments with the data structures used in our code to reduce the time complexity.
  • 14. References ● http://www.www2015.it/documents/proceedings/proceedings/p1067.pdf ● http://arxiv.org/pdf/1403.6652.pdf ● http://cs224d.stanford.edu/lecture_notes/LectureNotes1.pdf ● https://github.com/torch/nn ● https://github.com/gjwhr/dl4nlp-made-easy/blob/master/word2vec/cbow.lua ● https://github.com/gjwhr/dl4nlp-made-easy/blob/master/word2vec/skipgram.lua