SlideShare a Scribd company logo
A Survey of Heterogeneous
Information Network Analysis
Chuan Shi, Member, IEEE,
Yitong Li, Jiawei Zhang, Yizhou Sun, Member, IEEE,
and Philip S. Yu, Fellow, IEEE
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015
Introduction
Introduction
Information networks is the interacting components which
constitute interconnected networks
Information network analysis has become a hot research topic in
data mining and information retrieval fields in the past decades
Most of information network have a basic assumption: the type of
objects or links is unique -> Homogeneous information network
Introduction
But most real systems consist of a large number of interacting, multi
typed components and we can model them as Heterogeneous
information network(HIN).
Compared to homogeneous information network The HIN can
effectively fuse more information and contain rich semantics, and
thus it forms a new development of data mining.
In this paper the author presents survey of Heterogeneous
information network and analysis.
Basic concepts and
Definitions
Basic definitions(1/4)
•Def 1. Information network
directed graph G = (V,E)
mapping function object type :
link type :
belongs to object type set :
belongs to link type set :
Basic definitions(2/4)
•Def 2. Hetero/Homogeneous information network
Heterogeneous information network
if the types of objects
or the types of relations
Otherwise, it is a homogeneous information network.
Basic definitions(2/4)
Basic definitions(3/4)
•Def 3. Network schema
Meta template for an information network G=(V,E)
The network schema of a heterogeneous information network
specifies type constraints on the sets of objects and relationships
among the objects.
※ Network instance
An information network following a network schema
Basic definitions(3/4)
•Def 4. Meta path
A meta path P is a path defined on a schema and is
denoted in the form of which defines a composite
relation between objects
where denotes the composition operator on relations.
Basic definitions(4/4)
Basic definitions(4/4)
Comparisons with related
concepts
• HIN ⊃ Homogeneous network
• HIN ⊃ Multi-relational network
• HIN ⊃ Multi-dimensional/mode network
• HIN ⊃ Composite network
• HIN ≒ Complex network
Example datasets
Three types of data that can be constructed HIN
1. Structured data
a. database table organized with entity-relation model
b. ex) bibliographic data
2. Semi structured data
a. XML format data
b. object -> attribute -> object
c. relation -> connections among attributes
3. Non structured data
a. Any data which have recognizable entities and extractable relations
Example datasets
Widely used HIN examples
1. Multi-relational network with single typed object
a. Object type = 1
b. Relation type >1
c. ex) Facebook, Twitter
Example datasets
Widely used HIN examples
2. Bipartite network
a. Object type = 2
b. Relation type > 1
c. ex) User-item, Document-word
d. k-partite graph can be constructed
Example datasets
Widely used HIN examples
3. Star-schema network
a. HIN that using the target object as a hub node
b. ex) Bibliographic information network
Movie, Patent data
Example datasets
Widely used HIN examples
4. Multiple-hub network
a. Bioinformatics data
Example datasets
Multiple HINs
Why Heterogeneous Information Network
Analysis
•It is a new development of data mining
Big data analysis is an emergent yet important task to be studied
Many different types of objects are interconnected
HIN can be an effective tool to deal with complex big data.
•It is an effective tool to fuse more information
We can fuse information across multiple social network platforms
•It contains rich semantics
Different-typed objects and links coexist and they carry different meanings
APA, APVPA, APV, etc...
Research
Developments
Research Developments
Similarity measure
❏ Goal: consider both structure
similarity of two objects and the
meta path connecting two objects
(e.g. APA, APVPA, etc)
❏ Path based similarity measure
❏ The relevance of different-
typed objects
❏ meta path based relevance
search + user preference
different similarities according to meta paths
(different semantic meanings)
image-tag-image
(based on common tags)
image-tag-image-
group-image-tag-
image
(further measured
by shared groups)
Sun, Yizhou, et al. "Pathsim: Meta path-based top-k similarity search in heterogeneous information networks." VLDB’11
Clustering
❏ Clustering based on networked
data
❏ based on a homogeneous
network (e.g. normalized cuts,
modularity)
❏ need to consider multiple types of
objects co-existing network
Clustering
❏ Integrate the attribute information
❏ based on the network structure,
connections in the network and
the vertex attributes
❏ Integrate the text information
❏ topic mining - a unified topic
model with HIN
❏ multiple objects clustering
Boden, B.,et al. "Density-Based Subspace Clustering in Heterogeneous Networks." Machine Learning and Knowledge Discovery in Databases
(2014)Deng, Hongbo, et al. "Probabilistic topic models with biased propagation on heterogeneous information networks." Proceedings of the 17th ACM
SIGKDD international conference on Knowledge discovery and data mining. 2011.
Clustering
❏ Integrate with mining tasks
❏ semi-supervised learning - path selection according to user guidance
(labeled information)
❏ ranking-based clustering on HIN - mutual promotion of clustering and
ranking
❏ Outlier detection
❏ detect association-based clique outliers in HIN
❏ find subnetwork outliers according to different queries and semantics
❏ a meta-path based outlier mining in HIN
Classification
❏ Classification in HIN
❏ classify multiple types of objects simultaneously
❏ the label of objects is decided by the effects of different-typed objects
along different typed links
❏ Multi-label classification
❏ use multiple types of relationships
mined from linkage structure of HIN
❏ Meta paths for feature generation
❏ Ranking-based classification
❏ mutually enhance classification
and ranking
knowledge propagation
Link prediction
❏ Challenges
❏ The links to be predicted are of
different types
❏ Dependencies existing among
multiple types of links
➔ collectively predict multiple types
of links
➔ utilize meta paths
❏ Others
❏ Link prediction across multiple
aligned heterogeneous networks
❏ Dynamic link prediction
different link relations
Ranking
❏ Challenges
❏ treating all objects equally will mix
different types of objects together
❏ different results under different meta
paths(different semantic meanings)
❏ Meta-path based ranking
❏ simultaneously evaluate the
importance of multiple types of
objects and meta paths
Recommendation
❏ Meta path
❏ explore the semantics and extract relations among objects
❏ Can effectively fuse all kinds of information
❏ utilize different contexts
❏ use interest groups
❏ unified framework of
multiple HIN features
Information fusion
❏ Across multiple aligned HINs
❏ via the shared common information entities
❏ A more comprehensive and consistent knowledge shared in different HINs
using their structures, properties, and activities
❏ Information can reach more users and achieve broader influence
❏ Transferring knowledge between aligned networks
❏ e.g. overcome cold start problem in recommendation system
Advanced topics
More complex network construction
❏ Easy to construct HIN with well-defined schema
❏ From real data?
❏ objects and links can be noisy or not reliable
❏ duplicated names
❏ missing relations
❏ ...
❏ high-quality HINs by cleaning
❏ integrated with information extraction, NLP, and other techniques
More powerful mining methods
❏ Network structure
Bipartie Star-schema Multiple-hub Weighted
Dynamic Multiple-network Schema-rich
More powerful mining methods
❏ Semantic mining
❏ node/link semantics
❏ different-typed nodes/links have different semantics
❏ meta-path
❏ different similarities under different meta paths
❏ constrained meta-path
❏ constraint on node
❏ constraint on link
APC APA
APA|P.L = “Data Mining”
APA|P.L = “Information Retrieval”
….
weighted meta-path
More powerful mining methods
Bigger networked data
❏ can flexibly and effectively integrate varied objects and
heterogeneous information
❏ However, many practical technique challenges in real HIN
❏ huge, dynamic, memory capacity ..
❏ Instead of whole network, hidden but small networks can be
mined
❏ Quick/parallel computation strategies have been considered
recently
Conclusion
Conclusion
❏ There is a surge on HIN in recent years because of rich
structural and semantic information.
❏ The recent/future developments of different data
mining tasks on HIN.
❏ An understanding of the fundamental issues and a
good starting point to work on this field.
Thank you !
Q & A

More Related Content

PDF
Deep Learning for Graphs
PDF
Lecture 1: What is Machine Learning?
PPTX
Neural networks.ppt
DOCX
NE7012- SOCIAL NETWORK ANALYSIS
PDF
Physics-Informed Machine Learning Methods for Data Analytics and Model Diagno...
PDF
Past, Present & Future of Recommender Systems: An Industry Perspective
PDF
An introduction to Deep Learning
PDF
Gnn overview
Deep Learning for Graphs
Lecture 1: What is Machine Learning?
Neural networks.ppt
NE7012- SOCIAL NETWORK ANALYSIS
Physics-Informed Machine Learning Methods for Data Analytics and Model Diagno...
Past, Present & Future of Recommender Systems: An Industry Perspective
An introduction to Deep Learning
Gnn overview

What's hot (20)

PPTX
Application of edge detection
PPTX
Federated Learning
PDF
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
PPTX
Recommender system
PDF
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
PDF
Deep learning - A Visual Introduction
PDF
Neural Architecture Search: Learning How to Learn
PPTX
Sentiment Analysis Using Product Review
PDF
Generative Adversarial Networks and Their Applications
PPTX
PPTX
Human Pose Estimation by Deep Learning
PDF
An Introduction to Neural Architecture Search
PDF
Introduction to Recommendation Systems
PPT
data mining
PDF
CS6010 Social Network Analysis Unit II
PDF
Computer Vision
PPTX
The Role of Natural Language Processing in Information Retrieval
PDF
Deep learning
PDF
Machine Learning for Recommender Systems MLSS 2015 Sydney
PPTX
Deep learning.pptx
Application of edge detection
Federated Learning
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Recommender system
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Deep learning - A Visual Introduction
Neural Architecture Search: Learning How to Learn
Sentiment Analysis Using Product Review
Generative Adversarial Networks and Their Applications
Human Pose Estimation by Deep Learning
An Introduction to Neural Architecture Search
Introduction to Recommendation Systems
data mining
CS6010 Social Network Analysis Unit II
Computer Vision
The Role of Natural Language Processing in Information Retrieval
Deep learning
Machine Learning for Recommender Systems MLSS 2015 Sydney
Deep learning.pptx
Ad

Viewers also liked (20)

PPTX
Pathways-Driven Sparse Regression Identifies Pathways and Genes Associated wi...
PPTX
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
PPTX
DeepWalk: Online Learning of Social Representations
PPTX
Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search
PPTX
Semi-automatic ground truth generation using unsupervised clustering and limi...
PPTX
Translated learning
PPTX
Self taught clustering
PDF
A walk in graph databases v1.0
PDF
[DL輪読会]Unsupervised Learning of 3D Structure from Images
PDF
[DL輪読会]Learning What and Where to Draw (NIPS’16)
PPTX
[DL輪読会]Learning convolutional neural networks for graphs
PPTX
[DL輪読会]TREE-STRUCTURED VARIATIONAL AUTOENCODER
PDF
[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...
PPTX
[DL輪読会]Image-to-Image Translation with Conditional Adversarial Networks
PDF
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
PPTX
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
PPTX
[DL輪読会]Semi supervised qa with generative domain-adaptive nets
PPTX
[DL輪読会]Unsupervised Cross-Domain Image Generation
PDF
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
PDF
[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...
Pathways-Driven Sparse Regression Identifies Pathways and Genes Associated wi...
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
DeepWalk: Online Learning of Social Representations
Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search
Semi-automatic ground truth generation using unsupervised clustering and limi...
Translated learning
Self taught clustering
A walk in graph databases v1.0
[DL輪読会]Unsupervised Learning of 3D Structure from Images
[DL輪読会]Learning What and Where to Draw (NIPS’16)
[DL輪読会]Learning convolutional neural networks for graphs
[DL輪読会]TREE-STRUCTURED VARIATIONAL AUTOENCODER
[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...
[DL輪読会]Image-to-Image Translation with Conditional Adversarial Networks
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
[DL輪読会]Semi supervised qa with generative domain-adaptive nets
[DL輪読会]Unsupervised Cross-Domain Image Generation
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generat...
Ad

Similar to A survey of heterogeneous information network analysis (20)

PPT
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
PPTX
2015 07-tuto3-mining hin
PPTX
Mining heterogeneous information networks
PDF
Iaetsd similarity search in information networks using
PDF
Ngdm09 han gao
PDF
Identical Users in Different Social Media Provides Uniform Network Structure ...
PDF
International Journal of Engineering Research and Development (IJERD)
PPT
Web mining
PPTX
Web mining: Concepts and applications
PDF
Introduction to Knowledge Graphs for Information Architects.pdf
PPTX
CHAPTER -12 it.pptx
PDF
An effective search on web log from most popular downloaded content
PDF
Mining Social Media Data for Understanding Drugs Usage
PDF
New prediction method for data spreading in social networks based on machine ...
PDF
A comprehensive survey of link mining and anomalies detection
PPTX
WEB MINING.pptx
PDF
network mining and representation learning
PDF
Nordic health data metadata
PPTX
Large Graph Mining
PDF
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
2015 07-tuto3-mining hin
Mining heterogeneous information networks
Iaetsd similarity search in information networks using
Ngdm09 han gao
Identical Users in Different Social Media Provides Uniform Network Structure ...
International Journal of Engineering Research and Development (IJERD)
Web mining
Web mining: Concepts and applications
Introduction to Knowledge Graphs for Information Architects.pdf
CHAPTER -12 it.pptx
An effective search on web log from most popular downloaded content
Mining Social Media Data for Understanding Drugs Usage
New prediction method for data spreading in social networks based on machine ...
A comprehensive survey of link mining and anomalies detection
WEB MINING.pptx
network mining and representation learning
Nordic health data metadata
Large Graph Mining
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...

More from SOYEON KIM (20)

PDF
Network-based machine learning approach for aggregating multi-modal data
PPTX
Revealing disease-associated pathways by network integration of untargeted me...
PPTX
Systems genetics approaches to understand complex traits
PPTX
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
PDF
Network embedding
PPTX
Integrative Pathway-based Survival Prediction utilizing the Interaction betwe...
PPTX
Deep learning based multi-omics integration, a survey
PPTX
Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid H...
PPTX
Text extraction from natural scene image, a survey
PPTX
Opinion Fraud Detection in Online Reviews by Network Effects
PPTX
Evaluating color descriptors for object and scene recognition
PPTX
Outcome-guided mutual information networks for investigating gene-gene intera...
PPTX
Spectral clustering
PPTX
Sentiwordnet: A publicly available lexical resource for opinion mining
PPT
Opinion spam and analysis
PPTX
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...
PPTX
Graph-based KNN Algorithm for Spam SMS Detection
PPTX
Deep belief networks for spam filtering
PPTX
A study on the spacio temporal trend of brand index using twitter messages se...
PPTX
A method to improve survival prediction using mutual information based network
Network-based machine learning approach for aggregating multi-modal data
Revealing disease-associated pathways by network integration of untargeted me...
Systems genetics approaches to understand complex traits
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
Network embedding
Integrative Pathway-based Survival Prediction utilizing the Interaction betwe...
Deep learning based multi-omics integration, a survey
Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid H...
Text extraction from natural scene image, a survey
Opinion Fraud Detection in Online Reviews by Network Effects
Evaluating color descriptors for object and scene recognition
Outcome-guided mutual information networks for investigating gene-gene intera...
Spectral clustering
Sentiwordnet: A publicly available lexical resource for opinion mining
Opinion spam and analysis
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...
Graph-based KNN Algorithm for Spam SMS Detection
Deep belief networks for spam filtering
A study on the spacio temporal trend of brand index using twitter messages se...
A method to improve survival prediction using mutual information based network

Recently uploaded (20)

PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
DOCX
Factor Analysis Word Document Presentation
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
Leprosy and NLEP programme community medicine
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PDF
Transcultural that can help you someday.
PDF
Introduction to the R Programming Language
PPTX
modul_python (1).pptx for professional and student
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
IMPACT OF LANDSLIDE.....................
PDF
Introduction to Data Science and Data Analysis
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Factor Analysis Word Document Presentation
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Leprosy and NLEP programme community medicine
[EN] Industrial Machine Downtime Prediction
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Pilar Kemerdekaan dan Identi Bangsa.pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Transcultural that can help you someday.
Introduction to the R Programming Language
modul_python (1).pptx for professional and student
IBA_Chapter_11_Slides_Final_Accessible.pptx
SAP 2 completion done . PRESENTATION.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
IMPACT OF LANDSLIDE.....................
Introduction to Data Science and Data Analysis

A survey of heterogeneous information network analysis

  • 1. A Survey of Heterogeneous Information Network Analysis Chuan Shi, Member, IEEE, Yitong Li, Jiawei Zhang, Yizhou Sun, Member, IEEE, and Philip S. Yu, Fellow, IEEE IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015
  • 3. Introduction Information networks is the interacting components which constitute interconnected networks Information network analysis has become a hot research topic in data mining and information retrieval fields in the past decades Most of information network have a basic assumption: the type of objects or links is unique -> Homogeneous information network
  • 4. Introduction But most real systems consist of a large number of interacting, multi typed components and we can model them as Heterogeneous information network(HIN). Compared to homogeneous information network The HIN can effectively fuse more information and contain rich semantics, and thus it forms a new development of data mining. In this paper the author presents survey of Heterogeneous information network and analysis.
  • 6. Basic definitions(1/4) •Def 1. Information network directed graph G = (V,E) mapping function object type : link type : belongs to object type set : belongs to link type set :
  • 7. Basic definitions(2/4) •Def 2. Hetero/Homogeneous information network Heterogeneous information network if the types of objects or the types of relations Otherwise, it is a homogeneous information network.
  • 9. Basic definitions(3/4) •Def 3. Network schema Meta template for an information network G=(V,E) The network schema of a heterogeneous information network specifies type constraints on the sets of objects and relationships among the objects. ※ Network instance An information network following a network schema
  • 11. •Def 4. Meta path A meta path P is a path defined on a schema and is denoted in the form of which defines a composite relation between objects where denotes the composition operator on relations. Basic definitions(4/4)
  • 13. Comparisons with related concepts • HIN ⊃ Homogeneous network • HIN ⊃ Multi-relational network • HIN ⊃ Multi-dimensional/mode network • HIN ⊃ Composite network • HIN ≒ Complex network
  • 14. Example datasets Three types of data that can be constructed HIN 1. Structured data a. database table organized with entity-relation model b. ex) bibliographic data 2. Semi structured data a. XML format data b. object -> attribute -> object c. relation -> connections among attributes 3. Non structured data a. Any data which have recognizable entities and extractable relations
  • 15. Example datasets Widely used HIN examples 1. Multi-relational network with single typed object a. Object type = 1 b. Relation type >1 c. ex) Facebook, Twitter
  • 16. Example datasets Widely used HIN examples 2. Bipartite network a. Object type = 2 b. Relation type > 1 c. ex) User-item, Document-word d. k-partite graph can be constructed
  • 17. Example datasets Widely used HIN examples 3. Star-schema network a. HIN that using the target object as a hub node b. ex) Bibliographic information network Movie, Patent data
  • 18. Example datasets Widely used HIN examples 4. Multiple-hub network a. Bioinformatics data
  • 20. Why Heterogeneous Information Network Analysis •It is a new development of data mining Big data analysis is an emergent yet important task to be studied Many different types of objects are interconnected HIN can be an effective tool to deal with complex big data. •It is an effective tool to fuse more information We can fuse information across multiple social network platforms •It contains rich semantics Different-typed objects and links coexist and they carry different meanings APA, APVPA, APV, etc...
  • 23. Similarity measure ❏ Goal: consider both structure similarity of two objects and the meta path connecting two objects (e.g. APA, APVPA, etc) ❏ Path based similarity measure ❏ The relevance of different- typed objects ❏ meta path based relevance search + user preference different similarities according to meta paths (different semantic meanings) image-tag-image (based on common tags) image-tag-image- group-image-tag- image (further measured by shared groups) Sun, Yizhou, et al. "Pathsim: Meta path-based top-k similarity search in heterogeneous information networks." VLDB’11
  • 24. Clustering ❏ Clustering based on networked data ❏ based on a homogeneous network (e.g. normalized cuts, modularity) ❏ need to consider multiple types of objects co-existing network
  • 25. Clustering ❏ Integrate the attribute information ❏ based on the network structure, connections in the network and the vertex attributes ❏ Integrate the text information ❏ topic mining - a unified topic model with HIN ❏ multiple objects clustering Boden, B.,et al. "Density-Based Subspace Clustering in Heterogeneous Networks." Machine Learning and Knowledge Discovery in Databases (2014)Deng, Hongbo, et al. "Probabilistic topic models with biased propagation on heterogeneous information networks." Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. 2011.
  • 26. Clustering ❏ Integrate with mining tasks ❏ semi-supervised learning - path selection according to user guidance (labeled information) ❏ ranking-based clustering on HIN - mutual promotion of clustering and ranking ❏ Outlier detection ❏ detect association-based clique outliers in HIN ❏ find subnetwork outliers according to different queries and semantics ❏ a meta-path based outlier mining in HIN
  • 27. Classification ❏ Classification in HIN ❏ classify multiple types of objects simultaneously ❏ the label of objects is decided by the effects of different-typed objects along different typed links ❏ Multi-label classification ❏ use multiple types of relationships mined from linkage structure of HIN ❏ Meta paths for feature generation ❏ Ranking-based classification ❏ mutually enhance classification and ranking knowledge propagation
  • 28. Link prediction ❏ Challenges ❏ The links to be predicted are of different types ❏ Dependencies existing among multiple types of links ➔ collectively predict multiple types of links ➔ utilize meta paths ❏ Others ❏ Link prediction across multiple aligned heterogeneous networks ❏ Dynamic link prediction different link relations
  • 29. Ranking ❏ Challenges ❏ treating all objects equally will mix different types of objects together ❏ different results under different meta paths(different semantic meanings) ❏ Meta-path based ranking ❏ simultaneously evaluate the importance of multiple types of objects and meta paths
  • 30. Recommendation ❏ Meta path ❏ explore the semantics and extract relations among objects ❏ Can effectively fuse all kinds of information ❏ utilize different contexts ❏ use interest groups ❏ unified framework of multiple HIN features
  • 31. Information fusion ❏ Across multiple aligned HINs ❏ via the shared common information entities ❏ A more comprehensive and consistent knowledge shared in different HINs using their structures, properties, and activities ❏ Information can reach more users and achieve broader influence ❏ Transferring knowledge between aligned networks ❏ e.g. overcome cold start problem in recommendation system
  • 33. More complex network construction ❏ Easy to construct HIN with well-defined schema ❏ From real data? ❏ objects and links can be noisy or not reliable ❏ duplicated names ❏ missing relations ❏ ... ❏ high-quality HINs by cleaning ❏ integrated with information extraction, NLP, and other techniques
  • 34. More powerful mining methods ❏ Network structure Bipartie Star-schema Multiple-hub Weighted Dynamic Multiple-network Schema-rich
  • 35. More powerful mining methods ❏ Semantic mining ❏ node/link semantics ❏ different-typed nodes/links have different semantics ❏ meta-path ❏ different similarities under different meta paths ❏ constrained meta-path ❏ constraint on node ❏ constraint on link APC APA APA|P.L = “Data Mining” APA|P.L = “Information Retrieval” …. weighted meta-path
  • 37. Bigger networked data ❏ can flexibly and effectively integrate varied objects and heterogeneous information ❏ However, many practical technique challenges in real HIN ❏ huge, dynamic, memory capacity .. ❏ Instead of whole network, hidden but small networks can be mined ❏ Quick/parallel computation strategies have been considered recently
  • 39. Conclusion ❏ There is a surge on HIN in recent years because of rich structural and semantic information. ❏ The recent/future developments of different data mining tasks on HIN. ❏ An understanding of the fundamental issues and a good starting point to work on this field.