Domainspecificsubgraph extraction ieee-bigdata2016

Harnessing Relationships for Domain-specific Subgraph
Extraction: A Recommendation Use Case
IEEE International Conference on Big Data
Washington D.C.,, USA, 5th – 8th December, 2016
Sarasi Lalithsena Pavan Kapanipathi Amit Sheth
sarasi@knoesis.org amit@knoesis.orgkapanipa@us.ibm.com
Kno.e.sis Research Center
Wright State University
Thomas J. Watson Research Center
IBM Research
Kno.e.sis Research Center
Wright State University

Knowledge Graphs on the Web
• Represent the data in structured format using a graph-based data
model
2
Linked Open Data
Google Knowledge Graph
Schema.org annotation
570M entities and 18B facts
> 1000 datasets
6.2M entities and 1B facts
8M entities and 70M facts

Knowledge Graph In Action
3
IBM Watson uses YAGO Hierarchy to
extract the types
Movie recommendation algorithms use
DBpedia and Linked MovieDB to
determine how two movies are
semantically relevant

Motivation
• Utilizing large cross-domain KGs can get computationally intensive
• Existing approaches extract relevant subgraphs by navigating
predefined number of hops (2-4) from known domain entities
4
A Movie recommendation system
extracts the subgraph by navigating 3
hops using 3072 movies in DBpedia
The subgraph encompasses 66% of the
DBpedia entities.

Motivation
• Certain applications are domain-specific and do not require the
complete knowledge graphs
5
Transformers
(2007)
The
Terminator
Cursed
Random
Hearts
Michael
Bay
James
Cameron
Wes
Craven
Sydney
Pollack
Action
Film
Los Angeles
director
director
director
director
knownFor
knownFor
deathCity
deathCity
Relevant for movie
recommendation
Not relevant for
movie
recommendation

Problem
6
How do we extract the domain-specific
subgraph from large cross-domain knowledge
graphs without compromising the accuracy?

Relationship is the key
7
Transformers
(2007)
The
Terminator
Cursed
Random
Hearts
Michael
Bay
James
Cameron
Wes
Craven
Sydney
Pollack
Action
Film
Los Angeles
director
director
director
director
knownFor
knownFor
deathCity
deathCity
Relevant for movie
recommendation
Not relevant for
movie
recommendation

Domain Specificity Measures for Relationships
• Association of the relationship with domain entities provides
evidence for domain specificity
8
m1
• Relationship director is specific to the movie
domain
• Relationship country is not specific to the movie
domain
• Association of the relationship with the domain
entities is straightforward with direct
relationships such as director and country
• However, it is not trivial for other relationships
such as award, spouse, and capital
Movie

Domain Specificity Measures for Relationships
• To measure the domain-specificity of both direct and indirect
relationships, we identify two characteristics of a dataset:
– Entity Type
– Property Path
• We formalize these two characteristics to calculate domain
specificity of a relationship
9

Type-based Domain Specificity Measure
Measure uses the association between entity types
10
m1
spouse
Movie
Director
Country
type
type
type
Strong association
Weak association
• Strength of association between the domain
entity type to the other entity type
Association between Movie type and
Director type
• Strength of association between the entity
type to the relationship
Association between award
relationship and Director type

Type-based Domain Specificity Measure
• Strength of association between directly connected entity types
𝑑_𝑡𝑦𝑝𝑟𝑒𝑙(𝑡𝑖, 𝑡𝑗) =
𝑒𝑑𝑔𝑒_𝑐𝑜𝑢𝑛𝑡𝑡𝑖,𝑡 𝑗
𝑒𝑑𝑔𝑒_𝑐𝑜𝑢𝑛𝑡𝑡𝑖
∗ 𝑒𝑑𝑔𝑒_𝑐𝑜𝑢𝑛𝑡𝑡 𝑗
• Strength of association between indirectly connected entity types
𝑖𝑛𝑑_𝑡𝑦𝑝𝑒𝑟𝑒𝑙(𝑡 𝑑, 𝑡 𝑛−1, 𝑛) =
𝑘=1
𝑛−1
𝑑_𝑡𝑦𝑝𝑒𝑟𝑒𝑙(𝑡 𝑘−1, 𝑡 𝑘)
• Strength of association between entity types and their direct relationships
𝑝𝑟𝑜𝑝_𝑟𝑒𝑙 𝑝, 𝑡 =
𝑒𝑑𝑔𝑒_𝑐𝑜𝑢𝑛𝑡 𝑝,𝑡
𝑒𝑑𝑔𝑒_𝑐𝑜𝑢𝑛𝑡 𝑝
𝑝𝑟𝑜𝑝_𝑠𝑐𝑜𝑟𝑒(𝑝, 𝑛) =
𝑡 𝑛−1 𝑗
∈𝐶
𝑖𝑛𝑑_𝑡𝑦𝑝𝑒𝑟𝑒𝑙 𝑡 𝑑, 𝑡 𝑛−1 𝑗
, 𝑛 ∗ 𝑝𝑟𝑜𝑝_𝑟𝑒𝑙(𝑝, 𝑡 𝑛−1 𝑗
)
11
D H1 H2 Hn-1 Hnp1
Movie Director Award
H3p2 p3 Pn+1pn
Between Movie and
Director
Between Movie and
Award
Between P3 and Award
nth Hop

Path-based Domain Specificity Measure
Measure uses the association between intermediate relationships
12
m1
I1
m2
I2
I3
I4
I5
m1
m2
I1
I3
I5
I6
I7
I8
I9
I10
I11
I12
I13
I14
I15
• Uses an iterative approach
by considering already
identified domain-specific
paths

Path-based Domain Specificity Measure
• Domain specificity of nth hop relationship depends on domain-specific paths of length n -1
𝑃𝑀𝐼 𝑝, 𝑑𝑠𝑝 𝑛−1 = 𝑙𝑜𝑔
𝑃𝑟𝑜𝑏(𝑝,𝑑𝑠𝑝 𝑛−1)
𝑃𝑟𝑜𝑏(𝑝) ∗𝑃𝑟𝑜𝑏(𝑑𝑠𝑝 𝑛−1)
𝑃𝑟𝑜𝑏 𝑝, 𝑑𝑠𝑝 𝑛−1 =
𝑃𝑎𝑡ℎ(𝑑𝑠𝑝 𝑛−1,𝑝)
𝑝∈𝑃 𝑃𝑎𝑡ℎ(𝑑𝑠𝑝 𝑛−1,𝑝)
To address the PMI’s sensitivity to low frequent value,
𝑁𝑃𝑀𝐼 𝑝, 𝑑𝑠𝑝 𝑛−1 =
𝑙𝑜𝑔
𝑃𝑟𝑜𝑏(𝑝,𝑑𝑠𝑝 𝑛−1)
𝑝𝑟𝑜𝑏 𝑝 ∗𝑃𝑟𝑜𝑏(𝑑𝑠𝑝 𝑛−1)
− log 𝑃𝑟𝑜𝑏(𝑝, 𝑑𝑠𝑝 𝑛−1)
13
D H1 H2 Hn-1 Hn
nth Hop
H3
p1 p2
p3 pn
Pn + 1
Domain specific paths
Between p3 and Domain specific
path p1 – p2

Evaluation: Recommendation Use Case
• Evaluate the effectiveness of the domain-specific subgraph using a
recommendation use case
• Implement an existing recommendation algorithm and use the n-
hop expansion subgraph (baseline) and domain-specific subgraph as
the (our approach)
• Use two domains Movie and Book with existing dataset MovieLens
and DBbook
• MovieLens consists of 1,000,209 ratings for 3883 movies by 6,040
users and DBbook 72,372 ratings for 8,170 books by 6181 users
14

Evaluation Metrics
• Graph reduction
– Measure the reduction of the graph with nodes, relationships and reachable
paths
• Impact on accuracy
– Precision@n
– Rating Deviation
• Impact on run time
15

Evaluation Metrics – Graph Reduction
Path-based Type-based
Relations Nodes Paths Relations Nodes Paths
2-hop 349 1.07M 108.4M 349 1.07M 108.4M
DSG2(15,15) 15 (95.7%) 0.08M (92.0%) 5.08M (95.3%) 14 (95.9%) 0.13M (87.6%) 17M (83.9%)
DSG2(25,25) 25 (92.8%) 0.13M (87.3%) 17.4M (83.8%) 24 (93.1%) 0.63M (40.9%) 61.6M (43.19%)
DSG2(35,35) 35 (90%) 0.64M (40.7%) 61.64M (43.1%) 32 (90.8%) 0.64M (40.7%) 61.62M (43.18%)
16
Movie domain: 2-hop graphs
Book domain: 2-hop graphs
2-hop 424 1.2M 793.4M 424 1.2M 793.4M
DSG2(15,15) 15 (96.5%) 0.09M (92.8%) 159.6M (79.6%) 15 (96.5%) 0.09M (92.8%) 159.7M (80%)

Evaluation Metrics – Graph Reduction
Movie 3-hop 636 2.86M 4885.3M 636 2.86M 4885.3M
DSG3(15,25,15) 30 (95.3%) 0.19M (93.2%) 48.4M (98.9%) 24 (96.2%) 0.26M (90.9%) 105.5M (97.83%)
Book 3-hop 641 3.2M 13582.8M 641 3.2M 13582.8M
DSG2(15,25,15) 31 (95.2%) 0.18M (94.2%) 1082.6M (92.2%) 21 (96.7%) 0.12M (96%) 1062.5M (92.33%)
17
3-hop graphs
In average, domain-specific subgraph has a reduction of 80% to 90% from
the n-hop expansion sub graph

Evaluation Metrics – Precision@n
Movie 2-hop graphs 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@𝑛 =
𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑖𝑡𝑒𝑚𝑠
𝑛
16

19
Movie 3-hop graph

20
Book domain

Evaluation Metrics – Rating Dev
• Rating Dev
𝑟𝑎𝑡𝑖𝑛𝑔𝑑𝑒𝑣 𝑢 =
𝑟∈𝑅 𝑖𝑡𝑒𝑚𝑟𝑎𝑡𝑖𝑛𝑔 𝑟 − 𝑎𝑣𝑔𝑟𝑎𝑡𝑖𝑛𝑔 𝑢
𝑅
21
Movie Rating
m1 5
m2 3
m3 3
m4 1
m5 1
Relevant Movies
Irrelevant Movies
N-hop subgraph
m1
m2
DSG
m2
m3

Evaluation Metrics – AvgDev
22
Movie domain 2-hop 3-hop
Baseline DSG2(15,15) Baseline DSG3(15,25,15)
5 0.8222 0.823 0.807 0.823
10 0.814 0.816 0.806 0.815
15 0.810 0.811 0.806 0.811
20 0.806 0.807 0.805 0.806
2-hop 3-hop
Baseline DSG2(15,15) Baseline DSG3(15,25,15)
1 0.592 0.584 0.533 0.558
2 0.599 0.604 0.571 0.579
3 0.601 0.614 0.569 0.579
4 0.606 0.617 0.595 0.595
5 0.610 0.620 0.596 0.6
Book domain

Evaluation Metrics – Run Time Performance
23
Movie Book
n-hop
expansion
DSG n-hop
expansion
DSG
Path Type Path Type
2-hop 72s 5s 11.2s 10.15m 1.3m 1.4m
3-hop 2 h 35 m 76s 3.2 m 7 h 40 m 15.2m 27m

Conclusion
• Propose an approach to extract a domain-specific sub graph from a
large, cross-domain KG
• Treat the non-taxonomical relationships as the first class object
• Approach was able to reduce the graph size by more than 80% which
led to a tenfold decrease in computation time of the
recommendation algorithm
• Accuracy of the algorithm shows no compromise rather found more
accurate results
24

24
Thank You!
http://knoesis.wright.edu/people/sarasi
sarasi@knoesis.org
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio, USA

Harnessing Relationships for Domain-specific Subgraph
Extraction: A Recommendation Use Case
Presented at the IEEE International Conference on Big Data (SocInfo 2016)
Washington, USA, 5th – 8th December, 2016
Amit Sheth
amit@knoesis.org
Pavan Kapanipathi
kapanipa@us.ibm.com
Sarasi Lalithsena
sarasi@knoesis.org

Domainspecificsubgraph extraction ieee-bigdata2016

More Related Content

What's hot (8)

Similar to Domainspecificsubgraph extraction ieee-bigdata2016 (20)

Recently uploaded (20)

Domainspecificsubgraph extraction ieee-bigdata2016

Editor's Notes