The Role of Patterns in the Era of Large Language Models

Yunyao Li
PAN-DL@EMNLP’23 | Adobe | December, 2023
The Role of Patterns in the Era of
Large Language Models
Initial Learnings from Constructing, Growing and Serving Large
Knowledge Graphs*
* Work done at IBM Research and Apple
yunyaol@adobe.com
@yunyao_li

Knowledge Bases
Image Source: https://www.csee.umbc.edu/courses/graduate/691/fall22/kg/

Example: Financial Content Knowledge Base
Financial
Reports
Ontology
[VLDB’2017] Creation and Interaction with Large-scale Domain-Speci
fi
c Knowledge Bases.
XML
Knowledge
Extraction
Overall Architecture: A Simpli
fi
ed View
Linking
Fusion
KG Construction
Transforming
>31,000 companies
439 industries
~170,000 insiders
~100 millions
fi
nancial
metrics ~22,000 industry
KPIs
Financial Content KB
KG Services
QA
APIs

Example: Saga
Structured
Knowledge
Sources
Real-time
Sources
Ontology
Unstructured
Knowledge
Sources
Linking
Fusion
KG Construction KG
Knowledge
Extraction
KG Services
QA
Semantic Annotation
… …
Embedding Services
[SIGMOD’23] Growing and Serving Large Open-domain Knowledge Graphs.
[SIGMOD’22] Saga: A Platform for Continuous Construction and Serving of Knowledge at Scale
Transforming
Overall Architecture: A Simpli
fi
ed View

Key Components
of KG Construction, Growth, and Services
KG
QA Linking
Embedding … ….
Extraction
Integration
Inference Introspection
Services
Construction & Growth

“Connor McDavid”
name
“Richmond Hill”
name
“97/01/13”
dob
place of birth
CITY
type
PERSON
type
name
“Jan 13”
bday
goals
HOCKEY_PLAYER
type
“43”
Source A
Source B

name
“Richmond Hill”
name
“97/01/13”
dob
place of birth
CITY
type
PERSON
type
name
“Jan 13”
bday
goals
HOCKEY_PLAYER
type
“43”
Source A
Source B
name
ID1
“Richmond Hill”
name
“January 13, 1997”
dob
place of birth
CITY
type
PERSON
type
goals
HOCKEY_PLAYER
type
“43”
Linking
Fusion

Entity Normalization & Variant Generation
Learning: Structured Representations
Capture Entity Semantic Structure
[COLING’2018] Exploiting Structure in Representation of Named Entities using Active Learning.
[ICDE’2018] LUSTRE: An Interactive System for Entity Structured Representation and Variant
Generation.
Generated normalizers for Watson Discovery
[AAAI’2020] PARTNER: Human-in-the-Loop Entity Name Understanding with Deep
Learning.
[EMNLP’2020] Learning Structured Representations of Entity Names using Active
Learning and Weak Supervision.
“Bank of America N.A.” “Bank of America National Association”
Pattern-Based: Synthesizing
Normalization and Variant
Generation Functions
“97/01/13” “January 13, 1997”

Graph Completion via Ontology Inference
KG
Ontology Inference Rules Updated KG
A has_mother B B has_child A
A has_father B B has_child A
A has_spouse B B has_spouse A
A contains B B is_part_of A
A has_child B A has_child C B has_sibling C C has_sibling B
… …
→
→
→
→
∧ → ∧

Example Inference
Who’s Kylian Mbappé’s mother?
Source: https://en.wikipedia.org/wiki/Kylian_Mbappé
No information
about his mother

Example Inference
Who’s Kylian Mbappé’s mother?
Source: https://www.wikidata.org/wiki/Q45094361
A has_child B A is_a female B has_mother A
∧ →
Fayza Lamri has_child Kylian Mbappé
Fayza Lamri is_a female
Kylian Mbappé has_mother Fayza Lamri
Infer high-quality facts
at scale

Fact Editing for LLM
Ontology-Guided Evaluation
Source: Evaluating the Ripple Effects of Knowledge Editing in Language Models https://arxiv.org/pdf/2307.12976.pdf

Scale Fact Collection
Missing / stale facts
Missing
Facts
Query
Synthesizer
QA System
candidate facts
Baseline
New
Facts

Missing
Facts
Query
Synthesizer
QA System
candidate facts
Baseline
New
Facts
Query-by-Committee
Missing
Facts
Query
Synthesizer
QA System
candidate facts
New
Facts
QA System
Q1
QA System
… …
… …
…
Qn
QbC
Selector
AnswerSet1
AnswerSetn
[EMNLP-DaSH’2022] Improving Human Annotation Effectiveness for Fact Collection by Identifying the Most Relevant Answers
Success Rate
fact collection
25%

Missing
Facts
Query
Synthesizer
QA System
candidate facts
Baseline
New
Facts
Open Domain Knowledge Extraction
[SIGMOD’23] Growing and Serving Large Open-domain Knowledge Graphs.
Throughput vs.
manual fact collection
>100x
Missing
Facts
Query
Synthesizer
Web Search
candidate facts w/
lower-con
fi
dence
New
Facts
Knowledge
Extractor
Fact
Corroboration

Extraction: Pattern vs. LLM
* All details simpli
fi
ed for presentation
If entity.type = “Person” And If
tuple.key = “Height” Return height
= extract(tuple.value, “d?.d+
s*m”)
You are an accurate information extraction system responsible to
fi
nd answers to a set of questions solely from a given passage.
For example
Now please work on the following task:
Questions: height
Passage:
Title: José Varela
Infobox properties:
{“Full name": "José Carlos Moreira Varela”
“Date of birth”: “15 September 1997 (age 26)”
“Place of birth”: “Praia, Cape Verde”
“Height”: “1.68 m (5 ft 6 in)”
… …}
Key Value
Full name José Carlos Moreira Varela
Date of birth 15 September 1997 (age 26)
Place of
birth
Praia, Cape Verde
Height 1.68 m (5 ft 6 in)
… …
Key-Value Pair Extractor Height Extractor
Height = 1.68 m
Prompt
Pattern-based Extractors
Height = 1.68 m
LLM
LLM-based Extractor
Demonstrate Example
InfoBox
Content

fi
ed for presentation purpose
tuple.key = “Height” Return height
= extract(tuple.value, “d?.d+
s*m”)
Key Value
Born 5 September 1808, Calcutta
…
Died 30 May 1869 (aged 60) ..
Political Party Liberal Party.
Spouse Annie Henrietta Templer …
… …
Key-Value Pair Extractor Height Extractor
Height = null
Height = 1.80 m
LLM
LLM-based Extractors
hallucination
fi
nd
answers to a set of questions solely from a given passage.
For example
Questions: height
Passage:
Title: Sir Arthur William Buller
Infobox properties:
{“Born": “5 September 1808”
“Calcutta, British India”
… …}
Demonstrate Example
InfoBox
Content
Prompt

fi
ed for presentation purpose
tuple.key = “Spouses”
Return spouse = extract(tuple.value,
PersonNameRegex), start time =
extract(tuple.value,
StartTimeRegex), end time =
extract(tuple.value, EndTimeRegex)
Key Value
Born Jacques Haussmann, …
Died October 31, 1958 (aged 86) …
Citizenship American
Education Clifton College
… …
Key-Value Pair Extractor Spouse Extractor
Spouse = Zita Johann
Start time = 1929
End time = 1933
Spouse = Joan Courtney
Start time = 1952
End time = 1988
LLM-based Extractors
fi
nd
answers to a set of questions solely from a given passage.
For example
Questions: spouse
Passage:
Title: John Houseman
Infobox properties:
{“Born”: “Jacques Haussmann”
“September 22, 1902”
… …}
Prompt
LLM
Demonstrate Example
InfoBox
Content
Spouse = Zita Johann
Start time = 1929
End time = 1933
Incomplete

A Side-by-Side Comparison
Pattern-based LLM-based
Throughput
Quality of Results
Simple Cases
Complex Cases
Development
Effort
Simple Cases
Complex Cases
High Low
High High
High
Medium
Medium
Medium
High
Low

Opportunity to Get the Best of Both Worlds
Source: Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes https://arxiv.org/pdf/2304.09433.pdf
A recent example
Additional reading: Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples!. https://arxiv.org/abs/2303.08559

Multilingual Coverage of KG
EN
ES
ES
IT
EN
ES
EN
DE
EN
ES
ES
ES
ES
IT
EN
EN
ES
ES
0%
100%
AR DE ES FR IT JA KO RU ZH
36
40
63
36
34
21
24
27
55
64
60
37
64
66
79
76
73
45
Coverage of entity names (Wikidata)
Major gap exists
[EMNLP’23] Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs

Multilingual Knowledge Graph Enrichment
EN
ES
ES
IT
EN
ES
EN
DE
EN
ES
ES
ES
ES
IT
EN
EN
ES
ES
EN
ES
ES
IT
EN
ES
IT
DE
EN
DE
ES
IT
EN
DE
ES
DE
IT
EN
ES
ES
IT
DE
EN
ES
ES
IT
DE
EN
ES
ES
IT
DE
EN
ES
ES
IT
DE
Before
Existing KG
After
Multilingually-enriched KG
M-NTA
Increasing multilingual coverage of locale-speci
fi
c facts.

M-NTA | Multi-source Naturalization, Translation, and Alignment
Leverages complementary knowledge across locales and tools
Naturalization
triple-to-text
KG
Machine Translation
Web Search
LLMs
Alignment
text-to-triple
Ensemblement
Triple Selection
Apple, is_a, fruit of the apple tree
Apple, is_a, American
multinational technology company
…
⟨
⟩
⟨
⟩
Apple is a fruit of the apple tree
Apple is an American multinational
technology company
…
リンゴはリンゴの
木
の実です
りんごはりんごの
木
の実です
…
Apple, is_a, fruit of the apple tree
リンゴはリンゴの
木
の実です
りんごはりんごの
木
の実です
…
⟨
⟩
リンゴりんご果実
6 4 1
リンゴりんご
6 4

Improve Question Answering
Reduce the number of unanswerable queries
DE ES FR ZH JA
+12.1%
+14.4%
+13.4%
+26.9%
+18.1%
MKQA Dataset 2
Dec. 9. Poster Session 4
Daniel Lee
Simone Conia

Introspection
Constraint Violation Detection
KG
Ontology Constraints Updated KG
Soft
Hard
|date of birth| 1
|date of death| 1
…
≤
≤
date of birth date of death
…
≤
Potential
errors
Errors

Introspection
Source: https://www.wikidata.org/wiki/Q455611
Data issues: format +
missing quali
fi
er

Introspection
Source: https://en.wikipedia.org/wiki/Plato
Date of birth:
• 428/427
• 424/423 BC
Date of Death:
348 BC
Extracted facts
- Two dates of birth
Potential error
Actual error
- Extracted date of birth is later than date of death
428/427 vs 348 BC

[EMNLP’23] FLEEK: Factual Error Detection and Correction with Evidence Retrieved from External Knowledge
FLEEK
Factual Error Detection and Correction with Evidence Retrieved from External Knowledge

FLEEK | Demo
Dec. 8. Poster Session 1
Farima
Fatahi Bayat
Kun Qian

FLEEK
Factual Error Detection and Correction with Evidence Retrieved from External Knowledge
Input Text
Fact Extraction
text-to-triple
Question Generation
triple-to-question
Veri
fi
cation
Revision
Final Correction
Evidence Retrieval
[EMNLP’23] FLEEK: Factual Error Detection and Correction with Evidence Retrieved from External Knowledge

Conversational KG QA with LLM-generated Dialogs

Con
fi
gurable Attributes
User Experience Level
Voice Interaction
Search Interaction
Metadata Level
Popularity Scores - Long Tail Entities
Timestamps
Conversation Level
Topic Exploration
Extend to Related Entities & Neighbors

Voice Assistant Questions
More well-formed questions,
with a small mix of queries
Dis
fl
uencies - yes
Deixis - yes
Web Search Queries
Often short queries, mimic
search engine interactions but
with follow-ups
Dis
fl
uencies - no
Deixis - yes
Typos - yes

Voice Assistant Questions
Dis
fl
uencies & Deixis
Question: Hmm, which languages does Karl Wolff use
Answer: German
Question: Could you please, um, inform me about his military branch
Answer: Waffen-SS
Question: Do you know which wars he was a part of
Answer: ['Italian campaign', 'World War II', 'World War I']
Question: Do you know his military ranks
Answer: ['Obergruppenführer', ‘general']
Question: Do you know his date of birth
Answer: +1900-05-13
Question: Where was he born
Answer: Darmstadt
Question: Can you, uh, tell me when this military person died
Answer: +1984-07-15

Voice Assistant Questions with Related Entities
Question: Do you know any languages that Karl Wolff speaks
Answer: German
Question: Which military branch is he a part of
Answer: Waffen-SS
Question: Could you please, um, inform me about the wars he was involved in
Answer: ['Italian campaign', 'World War II’, ‘World War I']
Question: What about Sepp Dietrich
Answer: World War I
Question: Can you tell me, um, Karl's military rank
Answer: ['Obergruppenführer', 'general']
Question: How about Sep
Answer: SS-Oberst-Gruppenführer
Question: Can you, uh, tell me the birthplace of Karl
Answer: Darmstadt
Question: Ermm, what about Sepp
Answer: Hawangen
Primary Entity Related Entity

Web Search Queries — Short & Keyword-esque
Question: Karl Wolff country of citizenship
Answer: Germany
Question: wars involving him
Question: Also for Sepp Dietrich
Answer: World War I
Question: Karl place of birth
Answer: Darmstadt
Question: Answer for Sepp
Answer: Hawangen
Question: Karl died in
Answer: Rosenheim
Question: For Sepp
Answer: Ludwigsburg
Question: Karl military rank

Web Search Queries + Typos
Question: Kerl Wilff contry of citizenship
Answer: Germany
Question: wars involvng him
Question: Also fr Sepp Dietrich
Answer: World War I
Question: KJarl place of birth
Answer: Darmstadt
Question: Answer for Sep
Answer: Hawangen
Question: Karl died in
Answer: Rosenheim
Question: For Sepp
Answer: Ludwigsburg
Question: Karl mlitary rsnk

Dataset Statistics - Exhaustive & Evergrowing
Dataset
# Entities
(# Conversations)
# Facts # Questions Per Fact # Unique Types
# Unique
Predicates
General Set 29M 196M 12 (Web) + 12 (Voice) 274 1252
Related Entities
Set
210K 6.1M
24
[+ 30 (RE Follow-Up)]
95 265

Internal use only–do not distribute.
Evaluation - Effectiveness of LLMs on these conversations
Model
Question Type
Experience
Accuracy
GPT-3.5 Voice Assistant 25.9
GPT-4 Voice Assistant 32.4
GPT-3.5 Web Search 28.6
GPT-4 Web Search 35.7
General Subset
Model
Question Type
Experience
Accuracy
GPT-3.5 Voice Assistant 37.7
GPT-4 Voice Assistant 44.4
GPT-3.5 Web Search 38.7
GPT-4 Web Search 46.7
Related Entity Subset

Direct Triple Retrieval
Triple Retrieval
Direct Retrieval without Entity Linking
Triple Index
Query
• [S1, R1, O1]
• [S2, R2, O2]
• ……….
• [S100, R100, O100]
LLM
(Prompting for
Answer
Generation)
Answer
You are a question answering agent.
You will always provide short concise answers.
Based on the following evidence:
Fact 1: …..
Fact 2: …..
…
Fact N: …..
Answer the question using only the
evidence above:
Query
Query Triple
BERT BERT
hq ht
sim(q, t) = hqT ht
Can only work well for simple questions!

Subgraph + Triple Retrieval (Ours)
• We consider two types of subgraphs:
• Cliques: Subgraph containing predicates of an entity.
• 2-hop subgraphs: Subgraph containing predicates of one
and two-hop entities together
Original Graph
Cliques 2-hop Subgraphs
Triple Selection
Direct Retrieval without Entity Linking
Subgraph
Index
Query
• [S1, R1, O1]
• [S2, R2, O2]
• …
• [S100, R100,
LLM
(Prompting for
Answer
Generation)
Answer
Subgraph Retrieval

Evaluation
System Accuracy
System A 15.4
Direct Triple Retrieval 53.9
Subgraph + Triple Retrieval (Ours) 56.3
System Dataset 1 Dataset 2
System A 17.2 21.1
Ours 24.1 26.3
Public Benchmark Internal Benchmark

Wins
Overcoming Intent Detection Errors
Query: how old was Ronald Reagan when he was
inaugurated president
System A Answer
Ronald Reagan died June 5,
2004 at age 93 in X.
Our Answer
Ronald Reagan was 69 years old
when he was inaugurated president.
Query: what movies were Bill Cosby and Sidney Poitier in?
System A Answer
<empty>
Our Answer
Bill Cosby and Sidney Poitier have been
in several movies together, including:
Uptown Saturday Night , Let's Do It
Again
Query: who is the female lead in the movie music man?
System A Answer
<empty>
Our Answer
The female lead in the movie Music
Man is Shirley Jones.
Handling Multi-Hop Queries
Query: The drummer for Nirvana was born in what city?
System A Answer Our Answer
The drummer for Nirvana, Dave Grohl,
was born in Warren, Ohio.
<empty>

Entity Linking
Virginia Heffernan was born in Hanover
Theo Lingen was born in Hanover
Hanover, New
Hampshire
Hanover,
Germany

Online Entity Linking
Enabling better question answering

Of
fl
ine Entity Linking
Enable Better Ranking and Search
“They bought tickets for Beyoncé; she’s performing at T-Mobile Park”
KG:123
KG Info
KG:345
KG Info
0.914
0.312
Entity Importance
Entity Importance
Entity Embedding Index
related entities (approximate
nearest neighbor search)
"entity_name": "Beyonce Knowles",
“entity_types": [
"artist",
"human",
“writer"],
…
"entity_name": "T-Mobile Park",
“entity_types": [
“stadium",
"POI",
"location"],
…

Custom Con
fi
guration
Specify what types should be included and what should not be present
Improve linking quality
Include:
City, Natural Place, Landmark, National Park, …
Exclude:
Company, Hospitals, Person, …
Example:
For “weather” use cases

Custom Tag Con
fi
guration
Example Use Case: Weather
Weather in Obama
Source: duckduckgo.com
Won’t be
considered with the
con
fi
guration
Obama [Person]
Obama [City]

Fact Ranking & Related Entities
Embed entities / relations / queries in embedding space
Query processing = nearest neighbor search
Lady Gaga, occupation, ?

Apple Con
fi
dential–Internal Use Only
Related Entities
LLM
Entity Descriptions
Embeddings
Search
Query Logs
Entity Co-occurrence
Reranking Related Entities KV Store
KG

Example Use Case
Fact Ranking and Related Entities
Lady Gaga
Song
Album
Related Entities
Movie
Shadow …
Dance Telephone
The Frame Artpop Chromatica …
Adrian
Grande
Beyoncé Bradley
Cooper
…
House of
Gucci
A Star
Is Born
Sin City: A
Dame to Kill
For
…
Fact Ranking: Lady Gaga is
fi
rst a musician then an actress

Example Use Case
Fact Ranking and Related Entities
Lady Gaga
Song
Album
Related Entities
Movie
Shadow …
Dance Telephone
The Frame Artpop Chromatica …
Adrian
Grande
Beyoncé Bradley
Cooper
…
House of
Gucci
A Star
Is Born
Sin City: A
Dame to Kill
For
…
Relatedness:
Based on KG +
query log

Key Components
of KG Construction, Growth, and Services
KG
QA Linking
Embedding … ….
Extraction
Integration
Inference Introspection
Construction & Growth Services

LLMs vs. KGs
Source: Shirui Pan, et al. Unifying Large Language
Models and Knowledge Graphs: A Roadmap
https://arxiv.org/abs/2306.08302 Source: Link

Thanks!
IBM (including interns):
Shivakumar Vaithyanathan
Sriram Raghavan
Rajasekar Krishnamurthy
Lucian Popa
Ron Fagin
Fred Reiss
Laura Chiticariu
Mauricio Hernadez
Eser Kandogan
Huaiyu Zhu
Kun Qian
Dakuo Wang
Maeda Hana
fi
Many amazing collaborators and interns …
Apple (including interns):
Ihab Ilyas
Theodoros Rekatsinas
Umar Farooq Minhas
Ali Mousavi
Jefferey Pound
Anil Pacaci
Hongyu Ren
Kun Qian
Fei Wu
Simone Conia
Sha (Zoey) Li
Azadeh Nikfarjam
Yisi Sang
Saloni Potdar
Farima Fatahi Bayat … …
Universities:
Azza Abouzeid (NYU-Abu Dhabi)
H. V. Jagadish (U. Of Michigan)
Fei Xia (U. Of Washington)
Kevin Chen-Chuan Chang (UIUC)
ChengXiang Zhai (UIUC)
Domenico Lembo(Sapienza University of
Rome)
Dragomir R. Radev (Yale)
Jonathan K. Kummerfeld (U. Of
Michigan)
Toby Li (U. of Notre Dame)
Rishabh Iyer (UT Dallas)
Eduard C. Dragut (Temple Univ.) … ….
Douglas Burdick
Alan Akbik
Nancy Wang
Prithiviraj Sen
Marina Danilevsky
Poornima Chozhiyath Raman
Sudarshan Rangarajan
Ramiya Venkatachalam
Kiran Kate
Chenguang Wang
Ishan Jindal
Yiwei Yang
Nikita Bhutani … ….

© 2023 Adobe. All Rights Reserved. Adobe Confidential.
Unleashing
Creativity
Adobe
Creative Cloud
Accelerating Document
Productivity
Adobe
Document Cloud
Powering Digital
Businesses
Adobe
Experience Cloud
Adobe
Experience
Cloud
©2023. Adobe. All Rights reserved. Adobe Confidential.

© 2023 Adobe. All Rights Reserved. Adobe Confidential.
Adobe Experience Cloud: breadth of integrated applications
Marketing Planning & Workflow
Marketing system of record to
connect, collaborate and execute
the workflows required for
personalization at scale and
content supply chain
Adobe Experience Platform
Open, cloud-native platform transforming behavioral and transactional data into unified customer
profiles that update in real time and use AI-driven insights to help deliver the right experiences
across every channel
Customer Journeys
Real-time, omni-channel customer
and account-based journey
orchestration & campaign
execution
Content & Commerce
Content management and
commerce solutions for
personalized, multi-channel
experiences
Data Insights & Audiences
Omni-channel experience insights &
intelligence, including first-party data
management & activation for known
& unknown audiences

The Role of Patterns in the Era of Large Language Models

More Related Content

What's hot (9)

Similar to The Role of Patterns in the Era of Large Language Models (20)

More from Yunyao Li (20)

Recently uploaded (20)

The Role of Patterns in the Era of Large Language Models