SlideShare a Scribd company logo
Understanding Large Social
Networks
Information Retrieval and Extraction (Major Project)
Team Members (Team No.: 57)
Madan Jhawar 201202018
Raj Patel 201301048
Kishan Vadaliya 201505621
Mentor
Ganesh Jawahar
Problem
Statement
To build a model that can capture the
network information of a node in an
efficient and scalable manner by
representing every node into low-
dimensional vector spaces.
Motivation
● Representing network nodes into low-dimensional vector spaces helps us in many
tasks such as visualization, node classification, and link prediction, etc., using
standard machine learning algorithms on top of that.
● The size of most real-world information networks such as YouTube, Flickr,
Facebook, etc., ranges from hundreds of nodes to millions and billions of nodes.
Scalable solutions to understand such networks are very useful in the industry.
● Deep Learning has worked wonders in Language Modelling and has provided
commendable results in node representation tasks as well.
Dataset used
● Blogcatalog dataset is used for this project.
● It contains 10312 nodes, 333983 links.
● These nodes and links represent users and their friendship respectively.
● Dataset - http://socialcomputing.asu.edu/datasets/BlogCatalog3
Baseline Paper
LINE: LARGE-SCALE INFORMATION
NETWORK EMBEDDING
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, Qiaozhu Mei
LINE: First-Order Proximity
● First-order proximity refers to the local pairwise proximity between the vertices in
the network (i.e Only the direct neighbours).
● To model the first-order proximity between each undirected edge (i, j), we define
‘joint probability’ between vertex vi
and vj
,
where the empirical probability is given by:
● Minimize the KL-divergence of two probability distributions, given by
LINE: Second-Order Proximity
● The second-order proximity assumes that vertices sharing many connections
to other vertices are similar to each other.
● In this model - Vertex plays two roles: the vertex itself( ) and a specific
“context” ( ) of other vertices.
● For each directed edge (i, j), we first define the probability of “context” vj
generated by vertex vi
as:
,
-
where the empirical probability is given by:
● We take KL-divergence as the distance function:
LINE: Negative Sampling (Optimization)
● If we carefully analyze the denominator, it is quite expensive operation as it
considers dot-product with every other node, where dot-product itself takes linear
(depends on size of the vector) time.
● To optimize we use Negative Sampling, which samples multiple negative edges
according to some noisy distribution for each edge (i, j)
Novelty
● The authors train the neural networks for the First-Order Proximity and the
Second-Order Proximity independently, and then concatenate the embeddings
learned from both the models for a node to get the final embeddings of a node.
● In our project, we use the same lookup table for the node representation in both
the neural networks, i.e., the node embeddings learned from the First-Order
Proximity are used in the neural network for the Second-Order Proximity, instead
of initializing them again to random values.
● The embeddings learned from the Second-Order Proximity deep-net are, then,
taken as the final embeddings of a node.
Languages used
● The complete code is written in
Lua language.
● For building Neural Networks,
Torch, a framework written in
Lua, is used.
● We use the scikit-learn library in
Python to test the accuracy of
our model.
Results
● Parameters:
Feature Value
Embedding Size 10
Number of Epochs 30
Learning Rate 0.035
Number of Negative Samples 10
Train%
60% 70% 80%
Micro-F1 39.34% 39.11% 41.62%
Macro-F1 23.82% 24.13% 25.36%
Results Contd..
Metric
● F-Scores:
Week 1
Getting started with NN
and Deep Learning
Week 2
Formalizing the problem
and getting acquainted
with the baseline paper
Week 3
Implementing first-order
proximity in Torch.
Week 4
Implementing second-order
proximity in Torch.
Week 5
Combining first-order
and second-order along
with extending the work
Project Timeline
Challenges Faced
● Understanding the mathematics of the baseline paper.
● Referring to other research articles related to the baseline paper.
● Getting around with Lua and Torch.
● Not everything could be printed and debugged while coding neural nets in Lua.
● The dataset is very large, so we had to make several adjustments with the data
structures used in our code to reduce the time complexity.
References
● http://www.www2015.it/documents/proceedings/proceedings/p1067.pdf
● http://arxiv.org/pdf/1403.6652.pdf
● http://cs224d.stanford.edu/lecture_notes/LectureNotes1.pdf
● https://github.com/torch/nn
● https://github.com/gjwhr/dl4nlp-made-easy/blob/master/word2vec/cbow.lua
● https://github.com/gjwhr/dl4nlp-made-easy/blob/master/word2vec/skipgram.lua
Other links
https:/github.com/raj454raj/Graph2vec/
http://raj454raj.github.io/Graph2vec/
https://youtu.be/2Ifxt9nlzvk
Thank you!

More Related Content

PDF
Understanding Large Social Networks | IRE Major Project | Team 57
PDF
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
PDF
Secured transmission through multi layer perceptron in wireless communication...
PDF
ujava.org Deep Learning with Convolutional Neural Network
PDF
Overview of Convolutional Neural Networks
PDF
SECURED WIRELESS COMMUNICATION THROUGH SIMULATED ANNEALING GUIDED TRAINGULARI...
PDF
Secured wireless communication through simulated annealing guided traingulari...
PPTX
Efficient Neural Network Architecture for Image Classfication
Understanding Large Social Networks | IRE Major Project | Team 57
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Secured transmission through multi layer perceptron in wireless communication...
ujava.org Deep Learning with Convolutional Neural Network
Overview of Convolutional Neural Networks
SECURED WIRELESS COMMUNICATION THROUGH SIMULATED ANNEALING GUIDED TRAINGULARI...
Secured wireless communication through simulated annealing guided traingulari...
Efficient Neural Network Architecture for Image Classfication

What's hot (20)

PPTX
RNN & LSTM: Neural Network for Sequential Data
PDF
Understanding Convolutional Neural Networks
PPTX
Deep Learning Tutorial
PDF
PPTX
Project presentation
PPTX
Introduction to CNN
PPTX
Handwritten Digit Recognition and performance of various modelsation[autosaved]
PPT
Nural network ER. Abhishek k. upadhyay
PDF
RNN and its applications
PDF
Convolutional Neural Networks: Part 1
PPTX
CNN Tutorial
PPTX
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
PDF
Intro to Deep Learning
PPTX
TypeScript and Deep Learning
DOCX
K means report
PDF
End-to-End Object Detection with Transformers
PDF
Recent Progress in RNN and NLP
PDF
Learning Communication with Neural Networks
PPTX
Autoencoders for image_classification
PPTX
Convolutional Neural Network (CNN) - image recognition
RNN & LSTM: Neural Network for Sequential Data
Understanding Convolutional Neural Networks
Deep Learning Tutorial
Project presentation
Introduction to CNN
Handwritten Digit Recognition and performance of various modelsation[autosaved]
Nural network ER. Abhishek k. upadhyay
RNN and its applications
Convolutional Neural Networks: Part 1
CNN Tutorial
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Intro to Deep Learning
TypeScript and Deep Learning
K means report
End-to-End Object Detection with Transformers
Recent Progress in RNN and NLP
Learning Communication with Neural Networks
Autoencoders for image_classification
Convolutional Neural Network (CNN) - image recognition
Ad

Similar to Understanding Large Social Networks | IRE Major Project | Team 57 | LINE (20)

PPTX
240506_JW_labseminar[Structural Deep Network Embedding].pptx
PPTX
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
PDF
Icon18revrec sudeshna
PDF
Scene understanding
PDF
attention is all you need.pdf attention is all you need.pdfattention is all y...
PDF
Handwritten Digit Recognition using Convolutional Neural Networks
PDF
Ripple Algorithm to Evaluate the Importance of Network Nodes
PPTX
250428_JW_labseminar[KGAT: Knowledge Graph Attention Network for Recommendati...
PPTX
240715_JW_labseminar[metapath2vec: Scalable Representation Learning for Heter...
PPTX
Bitcoin Price Prediction
PDF
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
PDF
Easily Trainable Neural Network Using TransferLearning
PPTX
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....
PPTX
Attentive Relational Networks for Mapping Images to Scene Graphs
PDF
Performance Comparison between Pytorch and Mindspore
PPTX
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
PDF
network mining and representation learning
PPTX
NS-CUK Seminar: H.E.Lee, Review on "Structural Deep Embedding for Hyper-Netw...
PDF
DLD meetup 2017, Efficient Deep Learning
PDF
Introduction to 3D Computer Vision and Differentiable Rendering
240506_JW_labseminar[Structural Deep Network Embedding].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
Icon18revrec sudeshna
Scene understanding
attention is all you need.pdf attention is all you need.pdfattention is all y...
Handwritten Digit Recognition using Convolutional Neural Networks
Ripple Algorithm to Evaluate the Importance of Network Nodes
250428_JW_labseminar[KGAT: Knowledge Graph Attention Network for Recommendati...
240715_JW_labseminar[metapath2vec: Scalable Representation Learning for Heter...
Bitcoin Price Prediction
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Easily Trainable Neural Network Using TransferLearning
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....
Attentive Relational Networks for Mapping Images to Scene Graphs
Performance Comparison between Pytorch and Mindspore
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
network mining and representation learning
NS-CUK Seminar: H.E.Lee, Review on "Structural Deep Embedding for Hyper-Netw...
DLD meetup 2017, Efficient Deep Learning
Introduction to 3D Computer Vision and Differentiable Rendering
Ad

Recently uploaded (20)

PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
composite construction of structures.pdf
PPT
Mechanical Engineering MATERIALS Selection
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PPTX
additive manufacturing of ss316l using mig welding
PPTX
OOP with Java - Java Introduction (Basics)
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPT
Project quality management in manufacturing
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Artificial Intelligence
PDF
PPT on Performance Review to get promotions
Model Code of Practice - Construction Work - 21102022 .pdf
Fundamentals of safety and accident prevention -final (1).pptx
R24 SURVEYING LAB MANUAL for civil enggi
composite construction of structures.pdf
Mechanical Engineering MATERIALS Selection
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
additive manufacturing of ss316l using mig welding
OOP with Java - Java Introduction (Basics)
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Embodied AI: Ushering in the Next Era of Intelligent Systems
Internet of Things (IOT) - A guide to understanding
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
CH1 Production IntroductoryConcepts.pptx
Foundation to blockchain - A guide to Blockchain Tech
Project quality management in manufacturing
Operating System & Kernel Study Guide-1 - converted.pdf
bas. eng. economics group 4 presentation 1.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Artificial Intelligence
PPT on Performance Review to get promotions

Understanding Large Social Networks | IRE Major Project | Team 57 | LINE

  • 1. Understanding Large Social Networks Information Retrieval and Extraction (Major Project) Team Members (Team No.: 57) Madan Jhawar 201202018 Raj Patel 201301048 Kishan Vadaliya 201505621 Mentor Ganesh Jawahar
  • 2. Problem Statement To build a model that can capture the network information of a node in an efficient and scalable manner by representing every node into low- dimensional vector spaces.
  • 3. Motivation ● Representing network nodes into low-dimensional vector spaces helps us in many tasks such as visualization, node classification, and link prediction, etc., using standard machine learning algorithms on top of that. ● The size of most real-world information networks such as YouTube, Flickr, Facebook, etc., ranges from hundreds of nodes to millions and billions of nodes. Scalable solutions to understand such networks are very useful in the industry. ● Deep Learning has worked wonders in Language Modelling and has provided commendable results in node representation tasks as well.
  • 4. Dataset used ● Blogcatalog dataset is used for this project. ● It contains 10312 nodes, 333983 links. ● These nodes and links represent users and their friendship respectively. ● Dataset - http://socialcomputing.asu.edu/datasets/BlogCatalog3
  • 5. Baseline Paper LINE: LARGE-SCALE INFORMATION NETWORK EMBEDDING Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, Qiaozhu Mei
  • 6. LINE: First-Order Proximity ● First-order proximity refers to the local pairwise proximity between the vertices in the network (i.e Only the direct neighbours). ● To model the first-order proximity between each undirected edge (i, j), we define ‘joint probability’ between vertex vi and vj , where the empirical probability is given by: ● Minimize the KL-divergence of two probability distributions, given by
  • 7. LINE: Second-Order Proximity ● The second-order proximity assumes that vertices sharing many connections to other vertices are similar to each other. ● In this model - Vertex plays two roles: the vertex itself( ) and a specific “context” ( ) of other vertices. ● For each directed edge (i, j), we first define the probability of “context” vj generated by vertex vi as: , - where the empirical probability is given by: ● We take KL-divergence as the distance function:
  • 8. LINE: Negative Sampling (Optimization) ● If we carefully analyze the denominator, it is quite expensive operation as it considers dot-product with every other node, where dot-product itself takes linear (depends on size of the vector) time. ● To optimize we use Negative Sampling, which samples multiple negative edges according to some noisy distribution for each edge (i, j)
  • 9. Novelty ● The authors train the neural networks for the First-Order Proximity and the Second-Order Proximity independently, and then concatenate the embeddings learned from both the models for a node to get the final embeddings of a node. ● In our project, we use the same lookup table for the node representation in both the neural networks, i.e., the node embeddings learned from the First-Order Proximity are used in the neural network for the Second-Order Proximity, instead of initializing them again to random values. ● The embeddings learned from the Second-Order Proximity deep-net are, then, taken as the final embeddings of a node.
  • 10. Languages used ● The complete code is written in Lua language. ● For building Neural Networks, Torch, a framework written in Lua, is used. ● We use the scikit-learn library in Python to test the accuracy of our model.
  • 11. Results ● Parameters: Feature Value Embedding Size 10 Number of Epochs 30 Learning Rate 0.035 Number of Negative Samples 10
  • 12. Train% 60% 70% 80% Micro-F1 39.34% 39.11% 41.62% Macro-F1 23.82% 24.13% 25.36% Results Contd.. Metric ● F-Scores:
  • 13. Week 1 Getting started with NN and Deep Learning Week 2 Formalizing the problem and getting acquainted with the baseline paper Week 3 Implementing first-order proximity in Torch. Week 4 Implementing second-order proximity in Torch. Week 5 Combining first-order and second-order along with extending the work Project Timeline
  • 14. Challenges Faced ● Understanding the mathematics of the baseline paper. ● Referring to other research articles related to the baseline paper. ● Getting around with Lua and Torch. ● Not everything could be printed and debugged while coding neural nets in Lua. ● The dataset is very large, so we had to make several adjustments with the data structures used in our code to reduce the time complexity.
  • 15. References ● http://www.www2015.it/documents/proceedings/proceedings/p1067.pdf ● http://arxiv.org/pdf/1403.6652.pdf ● http://cs224d.stanford.edu/lecture_notes/LectureNotes1.pdf ● https://github.com/torch/nn ● https://github.com/gjwhr/dl4nlp-made-easy/blob/master/word2vec/cbow.lua ● https://github.com/gjwhr/dl4nlp-made-easy/blob/master/word2vec/skipgram.lua