SlideShare a Scribd company logo
Insights from Knowledge Graphs
Anirudh Prabhu,
Keck Deep Time Data Infrastructure Team
and the Deep Carbon Observatory Data Science Team
@Anirudh_14
What are
insights?
How do we gain insights?
Reasoners
Visual Analytics
Network Science Approach
Reasoners
ONTOLOGY
5
Rules Engine – Apache Jena Example
[hurricane-half-hourly:
(?candidate dd:candidateEvent ?event),
(?event rdf:type dd:Hurricane),
(?candidate dd:candidateVariable ?variable),
(?variable dd:timeInterval ?timeInterval),
equal(?timeInterval, <http://darkdata.tw.rpi.edu/data/time-interval/half-hourly>),
makeSkolem(?assertion, dd:Hurricane, ?timeInterval)
->
(?candidate dd:compatibilityAssertion ?assertion),
(?assertion rdf:type dd:CompatibilityAssertion),
(?assertion dd:compatibilityValue dd:strong_compatibility),
(?assertion dd:assertionConfidence "0.5"^^xsd:double),
(?assertion dd:basisForAssertion <urn:rule/time_interval/hurricane-half-hourly>)
]
‘Half hourly’ time interval is best for Hurricanes and Tropical Storms.
Antecedent :
Containing
information about
Phenomena and
Temporal
Resolution.
Subsequent :
Containing the
compatibility
assertion
information.
6
Visual
Analytics
Visual Analytics
◦D3js/Visjs
◦VOWL
◦iGraph/VisNetwork
What is
encoded vs
What is
seen
Encoded Seen/Inferred/Calculated
Nodes Patterns in the Network Geometry
Edges Sub-Communities formed in the
Network
Layout (Mostly Force Directed) Important Hubs in the Network
Additional Parameters for Nodes
(Optional)
Additional metrics that explain the
complexity of the environment
(assortativity, betweenness,
centrality etc.)
9
• Comparison of how different networks change through time also
help understand the given environment.
VOWL
D3js
◦ JavaScript Library for Visualizing Data.
◦ Create force-directed network layout.
◦ Example
◦ https://bl.ocks.org/steveharoz/8c3e2524079
a8c440df60c1ab72b5d03
iGraph
◦ R package for creating
static graphs.
◦ Covers most of the
required functions for
creating, analyzing and
interpreting networks.
◦ Graph objects can be
easily converted to
different data structures
required for other
exploration.
12
Pb
U
P
Al
As
Cu
Ca
K
Na
C
S
Si
V
Ba
Fe
Mg
Mo
Se
visNetwork
◦ R package written using the
Javascript library.
◦ Easier to deal with data
structures in R, than using
JavaScript.
◦ The data objects from the
network can be directly used
for further analysis.
Coexisting Animal Families through last 542 million years
Animal Family Networks
Ediacaran Assemblage Networks
Extinction Event
at 560 Ma?
Drew Muscente: “Nama and White Sea fauna are different facies,
whereas a mass extinction occurred after the Avalonian.” – science
hypothesis
Network
Science
Approach
Libraries/Packages
17
Igraph
ggnetwork
Network
SNA
visNetwork
D3js
Threejs
ngraph
Data Structure
• Symmetric adjacency matrix
• Rows and column names represent mineral species
• Values represent co-occurrence of 2 minerals
Node List and Properties
Adjacency Matrix
Data Structures (contd.)
• Nodes• Links
What is
encoded vs
What is
seen
Encoded Seen/Inferred/Calculated
Nodes Patterns in the Network Geometry
Edges Sub-Communities formed in the
Network
Layout (Mostly Force Directed) Important Hubs in the Network
Additional Parameters for Nodes
(Optional)
Additional metrics that explain the
complexity of the environment
(assortativity, betweenness,
centrality etc.)
20
• Comparison of how different networks change through time also
help understand the given environment.
Network
Metrics
Comparing
Global
Metrics
22
Assortativity
(Homophily)
◦ Network equivalent of
Pearson correlation
coefficient
◦ Values between 1 & -1
◦ 1 = similarity favors
connections
◦ 0 = non-assortative
◦ -1 = opposites attract
23
•Muscente AD, Prabhu A, Zhong H, Eleish A, Meyer M,
Fox P, Hazen R, and Knoll A (2017) The network
paleoecology of mass extinctions. PNAS.
Community
Detection
◦ Finding communities in a network
◦ Insight into the nature of the nodes
◦ Patterns of the evolution of the network
◦ Relationships between the subgroups
Walktrap algorithm
Example : Mineral Co-occurence
26
Morrison SM, Liu C, Eleish A, Prabhu A, Li
C, Ralph J, Downs RT, Golden JJ, Fox P,
Hummer DR, Meyer MB, and Hazen RM
(2017) Network analysis of mineralogical
systems. American Mineralogist 102
• Groups correspond to Paragenetic Mode.
• Paragenetic Mode : Formation Conditions.
• How and when the Minerals were formed.
Example : Evolving Networks
27
Moore, E. K., Hao, J., Prabhu,
A., Zhong, H., Jelen, B. I.,
Meyer, M., ... & Falkowski, P.
G. (2018). Geological and
Chemical Factors that
Impacted the Biological
Utilization of Cobalt in the
Archean Eon. Journal of
Geophysical Research:
Biogeosciences.
Simple Examples
◦ https://jupyter.deepcarbon.net/user/anirudhprabhu/notebooks/Code/R
FM_Network.ipynb
◦ https://deeptime.tw.rpi.edu/jupyter/user/6d32485f-bcb8-473e-99fe-
66ce2f2a4e44/notebooks/U/U_minerals_deposit_types.ipynb
Thank You
Questions?
Metrics: Local
Degree is the number of links connected to a given node.
35
1 2
2
3
0.56
0 0
0.5
0
10
1
1
Betweenness is a measure of the number of geodesic
paths that pass through a given node.
Distance is the geodesic (shortest) between any
two nodes.
Metrics: Global
Density, D, is the no. of links divided by
the no. of possible links
D = 0.66 D = 1D = 0.33
Low density High density
D =
2𝐿
𝑁(𝑁−1)
Metrics: Global
Diameter: largest geodesic distance in a network (the
shortest path between the two most separated nodes)
Mean Distance: average “degree of separation” in a
network
Metrics: Global
Centralization:
A measure of how central a network’s ”most central” node is relative to how
central all the other nodes are.
• Degree centralization: number of links to each node
• Are there many highly interconnected nodes?
• Betweenness centralization: number of shortest paths through
each node
• Are there a few key “broker” nodes?

More Related Content

PPTX
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
PPT
BioTorrents: A File Sharing Service for Scientific Data
PPTX
Unknown Genes, Community Profiling, & Biotorrents.net
PDF
Chenglin zhang CV-industry
PPT
Computation and Knowledge
PDF
Metadata Analyser: measuring metadata quality
PPT
Term Dependence on the Semantic Web
PPTX
Dynamic Collective Entity Representations for Entity Ranking
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
BioTorrents: A File Sharing Service for Scientific Data
Unknown Genes, Community Profiling, & Biotorrents.net
Chenglin zhang CV-industry
Computation and Knowledge
Metadata Analyser: measuring metadata quality
Term Dependence on the Semantic Web
Dynamic Collective Entity Representations for Entity Ranking

Similar to Insights from Knowledge Graphs (20)

PPTX
Visualizing Complex Environments in the Geo- and Biospheres
PPTX
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
PDF
Physics inspired artificial intelligence/machine learning
PDF
Exploring New Frontiers in Inverse Materials Design with Graph Neural Network...
PPT
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
PDF
2D/3D Materials screening and genetic algorithm with ML model
PDF
Easing embedding learning by comprehensive transcription of heterogeneous inf...
PPTX
Accelerating Data-driven Discovery in Energy Science
PPTX
13 Community Detection
PPTX
A Knowledge Discovery Framework for Planetary Defense
PPTX
06 Community Detection
PDF
ChemNLP: A Natural Language Processing based Library for Materials Chemistry ...
PPTX
EarthCube Stakeholder Alignment Survey - End-Users & Professional Societies W...
PDF
Network Science: Theory, Modeling and Applications
PDF
Metadata as Linked Data for Research Data Repositories
PDF
NANO266 - Lecture 12 - High-throughput computational materials design
PPT
In search of lost knowledge: joining the dots with Linked Data
PPTX
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
PPTX
Research Objects @ HARMONY 2014
PDF
Hala skafkeynote@conferencedata2021
Visualizing Complex Environments in the Geo- and Biospheres
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
Physics inspired artificial intelligence/machine learning
Exploring New Frontiers in Inverse Materials Design with Graph Neural Network...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
2D/3D Materials screening and genetic algorithm with ML model
Easing embedding learning by comprehensive transcription of heterogeneous inf...
Accelerating Data-driven Discovery in Energy Science
13 Community Detection
A Knowledge Discovery Framework for Planetary Defense
06 Community Detection
ChemNLP: A Natural Language Processing based Library for Materials Chemistry ...
EarthCube Stakeholder Alignment Survey - End-Users & Professional Societies W...
Network Science: Theory, Modeling and Applications
Metadata as Linked Data for Research Data Repositories
NANO266 - Lecture 12 - High-throughput computational materials design
In search of lost knowledge: joining the dots with Linked Data
Creating an Urban Legend: A System for Electrophysiology Data Management and ...
Research Objects @ HARMONY 2014
Hala skafkeynote@conferencedata2021
Ad

Recently uploaded (20)

PDF
Global Data and Analytics Market Outlook Report
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Introduction to Data Science and Data Analysis
PPTX
modul_python (1).pptx for professional and student
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
Microsoft Core Cloud Services powerpoint
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
New ISO 27001_2022 standard and the changes
PPTX
Managing Community Partner Relationships
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
CYBER SECURITY the Next Warefare Tactics
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Introduction to Inferential Statistics.pptx
Global Data and Analytics Market Outlook Report
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Data Science and Data Analysis
modul_python (1).pptx for professional and student
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Microsoft Core Cloud Services powerpoint
Database Infoormation System (DBIS).pptx
Qualitative Qantitative and Mixed Methods.pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
New ISO 27001_2022 standard and the changes
Managing Community Partner Relationships
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Pilar Kemerdekaan dan Identi Bangsa.pptx
CYBER SECURITY the Next Warefare Tactics
[EN] Industrial Machine Downtime Prediction
Introduction to Inferential Statistics.pptx
Ad

Insights from Knowledge Graphs

  • 1. Insights from Knowledge Graphs Anirudh Prabhu, Keck Deep Time Data Infrastructure Team and the Deep Carbon Observatory Data Science Team @Anirudh_14
  • 3. How do we gain insights? Reasoners Visual Analytics Network Science Approach
  • 6. Rules Engine – Apache Jena Example [hurricane-half-hourly: (?candidate dd:candidateEvent ?event), (?event rdf:type dd:Hurricane), (?candidate dd:candidateVariable ?variable), (?variable dd:timeInterval ?timeInterval), equal(?timeInterval, <http://darkdata.tw.rpi.edu/data/time-interval/half-hourly>), makeSkolem(?assertion, dd:Hurricane, ?timeInterval) -> (?candidate dd:compatibilityAssertion ?assertion), (?assertion rdf:type dd:CompatibilityAssertion), (?assertion dd:compatibilityValue dd:strong_compatibility), (?assertion dd:assertionConfidence "0.5"^^xsd:double), (?assertion dd:basisForAssertion <urn:rule/time_interval/hurricane-half-hourly>) ] ‘Half hourly’ time interval is best for Hurricanes and Tropical Storms. Antecedent : Containing information about Phenomena and Temporal Resolution. Subsequent : Containing the compatibility assertion information. 6
  • 9. What is encoded vs What is seen Encoded Seen/Inferred/Calculated Nodes Patterns in the Network Geometry Edges Sub-Communities formed in the Network Layout (Mostly Force Directed) Important Hubs in the Network Additional Parameters for Nodes (Optional) Additional metrics that explain the complexity of the environment (assortativity, betweenness, centrality etc.) 9 • Comparison of how different networks change through time also help understand the given environment.
  • 10. VOWL
  • 11. D3js ◦ JavaScript Library for Visualizing Data. ◦ Create force-directed network layout. ◦ Example ◦ https://bl.ocks.org/steveharoz/8c3e2524079 a8c440df60c1ab72b5d03
  • 12. iGraph ◦ R package for creating static graphs. ◦ Covers most of the required functions for creating, analyzing and interpreting networks. ◦ Graph objects can be easily converted to different data structures required for other exploration. 12 Pb U P Al As Cu Ca K Na C S Si V Ba Fe Mg Mo Se
  • 13. visNetwork ◦ R package written using the Javascript library. ◦ Easier to deal with data structures in R, than using JavaScript. ◦ The data objects from the network can be directly used for further analysis.
  • 14. Coexisting Animal Families through last 542 million years Animal Family Networks
  • 15. Ediacaran Assemblage Networks Extinction Event at 560 Ma? Drew Muscente: “Nama and White Sea fauna are different facies, whereas a mass extinction occurred after the Avalonian.” – science hypothesis
  • 18. Data Structure • Symmetric adjacency matrix • Rows and column names represent mineral species • Values represent co-occurrence of 2 minerals Node List and Properties Adjacency Matrix
  • 20. What is encoded vs What is seen Encoded Seen/Inferred/Calculated Nodes Patterns in the Network Geometry Edges Sub-Communities formed in the Network Layout (Mostly Force Directed) Important Hubs in the Network Additional Parameters for Nodes (Optional) Additional metrics that explain the complexity of the environment (assortativity, betweenness, centrality etc.) 20 • Comparison of how different networks change through time also help understand the given environment.
  • 23. Assortativity (Homophily) ◦ Network equivalent of Pearson correlation coefficient ◦ Values between 1 & -1 ◦ 1 = similarity favors connections ◦ 0 = non-assortative ◦ -1 = opposites attract 23 •Muscente AD, Prabhu A, Zhong H, Eleish A, Meyer M, Fox P, Hazen R, and Knoll A (2017) The network paleoecology of mass extinctions. PNAS.
  • 24. Community Detection ◦ Finding communities in a network ◦ Insight into the nature of the nodes ◦ Patterns of the evolution of the network ◦ Relationships between the subgroups
  • 26. Example : Mineral Co-occurence 26 Morrison SM, Liu C, Eleish A, Prabhu A, Li C, Ralph J, Downs RT, Golden JJ, Fox P, Hummer DR, Meyer MB, and Hazen RM (2017) Network analysis of mineralogical systems. American Mineralogist 102 • Groups correspond to Paragenetic Mode. • Paragenetic Mode : Formation Conditions. • How and when the Minerals were formed.
  • 27. Example : Evolving Networks 27 Moore, E. K., Hao, J., Prabhu, A., Zhong, H., Jelen, B. I., Meyer, M., ... & Falkowski, P. G. (2018). Geological and Chemical Factors that Impacted the Biological Utilization of Cobalt in the Archean Eon. Journal of Geophysical Research: Biogeosciences.
  • 28. Simple Examples ◦ https://jupyter.deepcarbon.net/user/anirudhprabhu/notebooks/Code/R FM_Network.ipynb ◦ https://deeptime.tw.rpi.edu/jupyter/user/6d32485f-bcb8-473e-99fe- 66ce2f2a4e44/notebooks/U/U_minerals_deposit_types.ipynb
  • 30. Metrics: Local Degree is the number of links connected to a given node. 35 1 2 2 3 0.56 0 0 0.5 0 10 1 1 Betweenness is a measure of the number of geodesic paths that pass through a given node. Distance is the geodesic (shortest) between any two nodes.
  • 31. Metrics: Global Density, D, is the no. of links divided by the no. of possible links D = 0.66 D = 1D = 0.33 Low density High density D = 2𝐿 𝑁(𝑁−1)
  • 32. Metrics: Global Diameter: largest geodesic distance in a network (the shortest path between the two most separated nodes) Mean Distance: average “degree of separation” in a network
  • 33. Metrics: Global Centralization: A measure of how central a network’s ”most central” node is relative to how central all the other nodes are. • Degree centralization: number of links to each node • Are there many highly interconnected nodes? • Betweenness centralization: number of shortest paths through each node • Are there a few key “broker” nodes?

Editor's Notes

  • #3: https://towardsdatascience.com/knowledge-graphs-and-machine-learning-3939b504c7bc
  • #6: To generate these candidates, we have developed individual rulesets that use compatibility assertions to describe how well the 2 entities work together. A candidate describes a combination of service, event, physical feature, 1-2 data fields. Rules are used to make compatibility assertions about the candidates. Each compatibility assertion value(which can be one of 5 values)and confidence metric(ranged from 0 to 1) pair, is associated with a single candidate. When the rules are run, we get all of the compatibility assertions for a candidate. Another set of rules look at associations between service, variables, events etc and makes a compatibility assertion with the relevance of events related information and visualization services. We then rank the candidates by plugging all the assertions and candidates into our scoring algorithm.
  • #7: In this slide, you can see an example of a rule which state that half-hourly time intervals are ideal to analyze Hurricane and Tropical storm data.
  • #11: Image is hyperlinked to the web version of VOWL.
  • #13: With the Animal Family Fossil Network, we see more pronounced extinction events. This may help identify previously unknown extinction events. When combined with other analytics methods, we can also quantify these extinction events.
  • #14: Click the image for the performance hyperlink. And use it to highlight the how subcommunities can be seen in network layouts.
  • #15: These types of analysis can also be done on larger scales! Here is
  • #21: ()