SlideShare a Scribd company logo
NOUS: Construction and
Querying of Dynamic
Knowledge Graphs
SUTANAY CHOUDHURY1, KHUSHBU AGARWAL1, SUMIT PUROHIT1, BAICHUAN
ZHANG2, MEG PIRRUNG1, WILLIAM SMITH1, MATHEW THOMAS1
1 : Pacific Northwest National Laboratory, Richland WA
2 : Purdue University, Indianapolis IN
April 22, 2017 2
Introduction
Knowledge Graphs
Challenges
Motivation (Use Cases)
Domain Specific Querying of Web
Question-answering for Climate Science
Technical Approach
NLP, Entity Disambiguation, Relation Learning
Frequent Pattern Mining, Question Answering
What’s unique
Results
Status and Future Work
Outline
April 22, 2017 3
Knowledge Graphs: Why do we care?
A collection of facts about people, places, things and relationships between them,
in a given context
I am a scientist. Read papers for me.
I am an analyst drowning in data. Help me find interesting events
I am a doctor, show me latest breakthrough to consider for this patient.
April 22, 2017 4
KG: Construction and Analytical
Challenges
Construction
No Base KB
Natural Language Processing is inherently noisy
Unseen entities and relationships
How do we determine its class
Should we include every new relation and entity in KB?
Analysis
Identifying target set of user questions and ontology needed to
answer them.
Mapping user questions to graph analytical tasks
Executing analytical tasks at scale
Pattern discovery: What’s emerging and what’s fading?
Question Answering:
Entity based Querying (What, Who, When, Where)
Hypothesis Generation (Why)
April 22, 2017 5
Introduction
Knowledge Graphs
Challenges
Motivation (Use Cases)
Domain Specific Querying of Web
Question-answering for Climate Science
Technical Approach
NLP, Entity Disambiguation, Relation Learning
Frequent Pattern Mining, Question Answering
What’s unique
Results
Status and Future Work
Outline
April 22, 2017 6
Use Case : Domain-specific Querying of
the Web
Input: multi-month web crawls obtained using domain expert suggestions
Analytic questions:
Find top manufacturers and model names
Find components and features
Who is popular?
What are new releases?
How are companies and components related?
How is country and product related?
Question-answering for Climate Science
DOE’s ARM program wants to make focused investments on
experimental campaigns
Monitor scientific literature to find which campaigns or instruments or
data products are being cited
Our target problems:
"What are the papers written on __aerosols______?"
"What datasets are used for ___aerosols______ publications?”
“What primary measurements are represented in the subset of aerosol
publications?”
"What instruments are represented in the aerosol publications?”
“What sites are most represented in the aerosol publications”?
April 22, 2017 7
April 22, 2017 8
NOUS Motivation
Tasks involved in building KB are not domain specific
Most questions can be mapped to a set of common graph analytical
tasks.
Tell me about X
Tell me about X in context of Y
What are recent trends about X
How are X and Y related
Why did X do Y
April 22, 2017 9
Introduction
Knowledge Graphs
Challenges
Motivation (Use Cases)
Domain Specific Querying of Web
Question-answering for Climate Science
Technical Approach
NLP, Entity Disambiguation, Relation Learning
Frequent Pattern Mining, Question Answering
What’s unique
Results
Status and Future Work
Outline
10
NOUS Workflow
Knowledge	
Graph	Model
Streaming	
Data
Knowledge	
Graph	Tasks
• Continuous pattern discovery.
• Question Answering
Knowledge	Graph		
Verification• Natural Language
processing
• Collective Entity Linking
• Distant supervision based
relationship discovery
Base KB + Streaming Data
Bayesian Personalized
Ranking-based link
prediction to assign
confidence
Protected Information | Proprietary Information
Apache	Spark
Triple Extraction from Natural Language
April 22, 2017 11
the	United	States	government crack
an	iPhone	that	belonged	to	a	
gunman	in	the	San	Bernardino
Apple repair
the	particular	iPhone	hole	that	the	
government	hacked.
Federal	officials specify
the	procedure	used	to	open	the	
iPhone
Federal	officials deny
to	specify	the	procedure	used	to	
open	the	iPhone.
Jay	Kaplan,	chief	executive	of	
the	tech	securAppley company	
Synack and	a	former	National	
Security	Agency	analyst. say
Apple	has	to	earn	the	trust	of	
Apples	customers,”
the	F.B.I. crack Mr.	Farook’s
LegbaCore,	which	previously	
found	and	fixed	flaws	for	
Apple. find flaws	for	Apple.
LegbaCore,	which	previously	
found	and	fixed	flaws	for	
Apple. fix flaws	for	Apple.
The	challenge:	turning	this	into	a	
high	quality	representation
Use existing tools
Stanford Core NLP
Open IE
April 22, 2017 12
Entity Disambiguation and Relation
Learning
Implements Collective Entity Linking from Han et al, SIGIR
2011
Key idea: Search the graph with matches for all the above terms,
build a mesh of terms and their related terms and pick the most
densely connected combination
Out of box performance low (higher 60%), when combined with
domain-specific rules (85-90%+)
Han	et	al	in	"Collective	Entity	
Linking	in	Web	Text:	A	Graph-
based	Method,	SIGIR	2011"
Relation Learning
Implements Distance Supervision
April 22, 2017 13
Now we have a Graph!
14
A Different Approach to Querying
Let’s not demand users learn SQL or SPARQL
Think in plain English, and we will transparently translate queries in background
Protected Information | Proprietary Information
April 22, 2017 15
Task 1: Finding Patterns from Data
Stream
A Pattern Growth approach : VLDB 2017 [Under Review]
Task 1: Finding Patterns from Data
Stream (Contd.)
April 22, 2017 16
We	discover	behavioral	patterns	of	drone	related	entities	(WSJ,	2010-2015)
Early	in	the	period
Towards	the	end
Patterns	help	us	understand	the	shift	in	a	domain,	and	discover	new	trends
Task 2: Question Answering for
Relationship Explanations
April 22, 2017 17
Our goal is to learn “patterns of explanations” by walking the graph
structure
Given a new question and the knowledge of such patterns, we can
generalize and answer questions about unseen entities
Examples:
Why would Ford buy Palladium?
Ford is-a automotive company, automotive company has-part catalytic converters,
catalytic converters has-material Palladium
Answer pattern: A is-a company B, B has-part C, C has-dependency D
Why would Sutanay visit SFO in August 15?
Sutanay has-interest Machine-Learning, SIGKDD related-to Machine-Learning,
SIGKDD has-location SFO, SIGKDD has-start-date August 13
Results – Explanatory Paths using
coherence
April 22, 2017 18
April 22, 2017 19
Use Case Results: Domain-specific
Querying of the Web
Input Size: 2 million+ webpages
Graph construction: 64-node cluster, each node with 16 cores
Graph Analytics: 16-node Spark/Hadoop cluster, each with 16 cores
April 22, 2017 20
Picking up real events by Patterns
April 22, 2017 21
What’s Phenomenal Here?
We are data scientists, not drone experts
We processed nearly two million web crawls with minimal custom
code and built the Knowledge Graph
Analyzed trends, connected dots across multiple source and presented
a hypothesis:
Is tracking autonomous drones and their use important?
Starting with the data analysis to coming up with the question : took less
than 1.5 hours
22
Key Takeaways
Open Source KB Construction and Querying Pipeline
https://github.com/streaming-graphs/NOUS
Version 1.0:
Support for extracting standard entities, relation extraction via distant
supervision,
Advanced trending and explanatory questions
The ability to answer queries where the answer is embedded across multiple
data sources
All algorithms implemented on top of Apache Spark and Scala
Protected Information | Proprietary Information
April 22, 2017 23
Status and Future Work
We are committed towards developing an open source community
A MySQL for Knowledge Graphs!
Preferred method of payment J
Add test code (in Scala or Python)
Break the code AND show an important class of problem
Version 2.0:
Improving KB quality,
Information maintenance over time,
Initial support for human-computer interaction.
Incorporate algorithms for latest advances in NLP, ED
April 22, 2017 24
Questions?

More Related Content

PPTX
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
PPTX
Haystack keynote 2019: What is Search Relevance? - Max Irwin
PPTX
The Rensselaer IDEA: Data Exploration
PPTX
Lecture #01
PPT
The Semantic Web: It's for Real
PPTX
The Science of Data Science
PPTX
Data Science presentation for elementary school students
PDF
What data scientists really do, according to 50 data scientists
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Haystack keynote 2019: What is Search Relevance? - Max Irwin
The Rensselaer IDEA: Data Exploration
Lecture #01
The Semantic Web: It's for Real
The Science of Data Science
Data Science presentation for elementary school students
What data scientists really do, according to 50 data scientists

What's hot (20)

PDF
Data science presentation 2nd CI day
PPTX
Data Science: Past, Present, and Future
PPS
Big Data Science: Intro and Benefits
PDF
How to become a Data Scientist?
PPTX
Tips and Tricks to be an Effective Data Scientist
PDF
Data science and_analytics_for_ordinary_people_ebook
PDF
Amrapali Zaveri Defense
PDF
Data science in_action
PPTX
Dagstuhl14 intro-v1
PDF
Demystifying Data Science with an introduction to Machine Learning
PDF
Data science
PPT
Broad Data (India 2015)
PPTX
Advancing Foundation and Practice of Software Analytics
PDF
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
PDF
Introduction to Data Science
PDF
Data science presentation
PDF
Past, Present, and Future of Analyzing Software Data
PPTX
Data Analytics
PDF
Life of a data scientist (pub)
PPTX
Science Data, Responsibly
Data science presentation 2nd CI day
Data Science: Past, Present, and Future
Big Data Science: Intro and Benefits
How to become a Data Scientist?
Tips and Tricks to be an Effective Data Scientist
Data science and_analytics_for_ordinary_people_ebook
Amrapali Zaveri Defense
Data science in_action
Dagstuhl14 intro-v1
Demystifying Data Science with an introduction to Machine Learning
Data science
Broad Data (India 2015)
Advancing Foundation and Practice of Software Analytics
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Introduction to Data Science
Data science presentation
Past, Present, and Future of Analyzing Software Data
Data Analytics
Life of a data scientist (pub)
Science Data, Responsibly
Ad

Similar to Construction and Querying of Dynamic Knowledge Graphs (20)

PPTX
Using Knowledge Graph for Promoting Cognitive Computing
PPT
SLA Summer 2008
PDF
Search Solutions 2011: Successful Enterprise Search By Design
PPTX
Building AI Applications using Knowledge Graphs
POTX
Using Bibliometrics to Keep Up with the Joneses
PDF
Applications of Natural Language Processing to Materials Design
PPTX
Emil Eifrem Keynote - GraphConnect Europe 2017
PDF
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
PDF
Kohlmeier "Innovations in Academic Search & Discovery - A Case Study From the...
PPTX
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
PDF
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
PDF
Building better knowledge graphs through social computing
PDF
Text Analytics - JCC2014 Kimelfeld
PPTX
PhD Day: Entity Linking using Generic Linked Data Datasets
PDF
Advancing Alcohol Behavior Change
PDF
Intro to Neo4j Webinar
PPTX
Department of Commerce App Challenge: Big Data Dashboards
PPTX
OSFair2017 Workshop | Text mining
PDF
The web of data: how are we doing so far
PDF
X api chinese cop monthly meeting feb.2016
Using Knowledge Graph for Promoting Cognitive Computing
SLA Summer 2008
Search Solutions 2011: Successful Enterprise Search By Design
Building AI Applications using Knowledge Graphs
Using Bibliometrics to Keep Up with the Joneses
Applications of Natural Language Processing to Materials Design
Emil Eifrem Keynote - GraphConnect Europe 2017
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
Kohlmeier "Innovations in Academic Search & Discovery - A Case Study From the...
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Building better knowledge graphs through social computing
Text Analytics - JCC2014 Kimelfeld
PhD Day: Entity Linking using Generic Linked Data Datasets
Advancing Alcohol Behavior Change
Intro to Neo4j Webinar
Department of Commerce App Challenge: Big Data Dashboards
OSFair2017 Workshop | Text mining
The web of data: how are we doing so far
X api chinese cop monthly meeting feb.2016
Ad

Recently uploaded (20)

PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Computer network topology notes for revision
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Introduction to machine learning and Linear Models
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Lecture1 pattern recognition............
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Qualitative Qantitative and Mixed Methods.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
.pdf is not working space design for the following data for the following dat...
Business Acumen Training GuidePresentation.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Computer network topology notes for revision
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction to machine learning and Linear Models
Reliability_Chapter_ presentation 1221.5784
Clinical guidelines as a resource for EBP(1).pdf
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Lecture1 pattern recognition............
IB Computer Science - Internal Assessment.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
climate analysis of Dhaka ,Banglades.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction to Knowledge Engineering Part 1
Acceptance and paychological effects of mandatory extra coach I classes.pptx

Construction and Querying of Dynamic Knowledge Graphs

  • 1. NOUS: Construction and Querying of Dynamic Knowledge Graphs SUTANAY CHOUDHURY1, KHUSHBU AGARWAL1, SUMIT PUROHIT1, BAICHUAN ZHANG2, MEG PIRRUNG1, WILLIAM SMITH1, MATHEW THOMAS1 1 : Pacific Northwest National Laboratory, Richland WA 2 : Purdue University, Indianapolis IN
  • 2. April 22, 2017 2 Introduction Knowledge Graphs Challenges Motivation (Use Cases) Domain Specific Querying of Web Question-answering for Climate Science Technical Approach NLP, Entity Disambiguation, Relation Learning Frequent Pattern Mining, Question Answering What’s unique Results Status and Future Work Outline
  • 3. April 22, 2017 3 Knowledge Graphs: Why do we care? A collection of facts about people, places, things and relationships between them, in a given context I am a scientist. Read papers for me. I am an analyst drowning in data. Help me find interesting events I am a doctor, show me latest breakthrough to consider for this patient.
  • 4. April 22, 2017 4 KG: Construction and Analytical Challenges Construction No Base KB Natural Language Processing is inherently noisy Unseen entities and relationships How do we determine its class Should we include every new relation and entity in KB? Analysis Identifying target set of user questions and ontology needed to answer them. Mapping user questions to graph analytical tasks Executing analytical tasks at scale Pattern discovery: What’s emerging and what’s fading? Question Answering: Entity based Querying (What, Who, When, Where) Hypothesis Generation (Why)
  • 5. April 22, 2017 5 Introduction Knowledge Graphs Challenges Motivation (Use Cases) Domain Specific Querying of Web Question-answering for Climate Science Technical Approach NLP, Entity Disambiguation, Relation Learning Frequent Pattern Mining, Question Answering What’s unique Results Status and Future Work Outline
  • 6. April 22, 2017 6 Use Case : Domain-specific Querying of the Web Input: multi-month web crawls obtained using domain expert suggestions Analytic questions: Find top manufacturers and model names Find components and features Who is popular? What are new releases? How are companies and components related? How is country and product related?
  • 7. Question-answering for Climate Science DOE’s ARM program wants to make focused investments on experimental campaigns Monitor scientific literature to find which campaigns or instruments or data products are being cited Our target problems: "What are the papers written on __aerosols______?" "What datasets are used for ___aerosols______ publications?” “What primary measurements are represented in the subset of aerosol publications?” "What instruments are represented in the aerosol publications?” “What sites are most represented in the aerosol publications”? April 22, 2017 7
  • 8. April 22, 2017 8 NOUS Motivation Tasks involved in building KB are not domain specific Most questions can be mapped to a set of common graph analytical tasks. Tell me about X Tell me about X in context of Y What are recent trends about X How are X and Y related Why did X do Y
  • 9. April 22, 2017 9 Introduction Knowledge Graphs Challenges Motivation (Use Cases) Domain Specific Querying of Web Question-answering for Climate Science Technical Approach NLP, Entity Disambiguation, Relation Learning Frequent Pattern Mining, Question Answering What’s unique Results Status and Future Work Outline
  • 10. 10 NOUS Workflow Knowledge Graph Model Streaming Data Knowledge Graph Tasks • Continuous pattern discovery. • Question Answering Knowledge Graph Verification• Natural Language processing • Collective Entity Linking • Distant supervision based relationship discovery Base KB + Streaming Data Bayesian Personalized Ranking-based link prediction to assign confidence Protected Information | Proprietary Information Apache Spark
  • 11. Triple Extraction from Natural Language April 22, 2017 11 the United States government crack an iPhone that belonged to a gunman in the San Bernardino Apple repair the particular iPhone hole that the government hacked. Federal officials specify the procedure used to open the iPhone Federal officials deny to specify the procedure used to open the iPhone. Jay Kaplan, chief executive of the tech securAppley company Synack and a former National Security Agency analyst. say Apple has to earn the trust of Apples customers,” the F.B.I. crack Mr. Farook’s LegbaCore, which previously found and fixed flaws for Apple. find flaws for Apple. LegbaCore, which previously found and fixed flaws for Apple. fix flaws for Apple. The challenge: turning this into a high quality representation Use existing tools Stanford Core NLP Open IE
  • 12. April 22, 2017 12 Entity Disambiguation and Relation Learning Implements Collective Entity Linking from Han et al, SIGIR 2011 Key idea: Search the graph with matches for all the above terms, build a mesh of terms and their related terms and pick the most densely connected combination Out of box performance low (higher 60%), when combined with domain-specific rules (85-90%+) Han et al in "Collective Entity Linking in Web Text: A Graph- based Method, SIGIR 2011" Relation Learning Implements Distance Supervision
  • 13. April 22, 2017 13 Now we have a Graph!
  • 14. 14 A Different Approach to Querying Let’s not demand users learn SQL or SPARQL Think in plain English, and we will transparently translate queries in background Protected Information | Proprietary Information
  • 15. April 22, 2017 15 Task 1: Finding Patterns from Data Stream A Pattern Growth approach : VLDB 2017 [Under Review]
  • 16. Task 1: Finding Patterns from Data Stream (Contd.) April 22, 2017 16 We discover behavioral patterns of drone related entities (WSJ, 2010-2015) Early in the period Towards the end Patterns help us understand the shift in a domain, and discover new trends
  • 17. Task 2: Question Answering for Relationship Explanations April 22, 2017 17 Our goal is to learn “patterns of explanations” by walking the graph structure Given a new question and the knowledge of such patterns, we can generalize and answer questions about unseen entities Examples: Why would Ford buy Palladium? Ford is-a automotive company, automotive company has-part catalytic converters, catalytic converters has-material Palladium Answer pattern: A is-a company B, B has-part C, C has-dependency D Why would Sutanay visit SFO in August 15? Sutanay has-interest Machine-Learning, SIGKDD related-to Machine-Learning, SIGKDD has-location SFO, SIGKDD has-start-date August 13
  • 18. Results – Explanatory Paths using coherence April 22, 2017 18
  • 19. April 22, 2017 19 Use Case Results: Domain-specific Querying of the Web Input Size: 2 million+ webpages Graph construction: 64-node cluster, each node with 16 cores Graph Analytics: 16-node Spark/Hadoop cluster, each with 16 cores
  • 20. April 22, 2017 20 Picking up real events by Patterns
  • 21. April 22, 2017 21 What’s Phenomenal Here? We are data scientists, not drone experts We processed nearly two million web crawls with minimal custom code and built the Knowledge Graph Analyzed trends, connected dots across multiple source and presented a hypothesis: Is tracking autonomous drones and their use important? Starting with the data analysis to coming up with the question : took less than 1.5 hours
  • 22. 22 Key Takeaways Open Source KB Construction and Querying Pipeline https://github.com/streaming-graphs/NOUS Version 1.0: Support for extracting standard entities, relation extraction via distant supervision, Advanced trending and explanatory questions The ability to answer queries where the answer is embedded across multiple data sources All algorithms implemented on top of Apache Spark and Scala Protected Information | Proprietary Information
  • 23. April 22, 2017 23 Status and Future Work We are committed towards developing an open source community A MySQL for Knowledge Graphs! Preferred method of payment J Add test code (in Scala or Python) Break the code AND show an important class of problem Version 2.0: Improving KB quality, Information maintenance over time, Initial support for human-computer interaction. Incorporate algorithms for latest advances in NLP, ED
  • 24. April 22, 2017 24 Questions?