SlideShare a Scribd company logo
Start Making Sense
Be F.A.I.R
Nordic HealthData – how do we utilize metadata in health practice?
INSTITUTIONEN FÖR TILLÄMPAD INFORMATIONSTEKNOLOGI | www.ait.gu.seINSTITUTIONEN FÖR TILLÄMPAD INFORMATIONSTEKNOLOGI | www.ait.gu.se
Moore humor anno 1965
Reality today 2020
http://orangecone.com/archives/2010/04/smart_things_ch_2.html
INSTITUTIONEN FÖR TILLÄMPAD INFORMATIONSTEKNOLOGI | www.ait.gu.seINSTITUTIONEN FÖR TILLÄMPAD INFORMATIONSTEKNOLOGI | www.ait.gu.se
It was 20+ years ago today…
Dr Pål Lindström
Pål, founded: LocusMedicus Int.
• A Healthcare Community of Practice
• Knowledge Networking automated and
augmented by AI
Challenges
Landfill
Nordic health data   metadata
Nordic health data   metadata
DATA
FAIR
Organising
Organising
Decipher
Lingua Franca
𝑥 + 𝑎 $
= &
'()
$
𝑛
𝑘
𝑥'
𝑎$,'
𝑓 𝑥 = 𝑎) + &
$(.
/
𝑎$ cos
𝑛𝜋𝑥
𝐿
+ 𝑏$ sin
𝑛𝜋𝑥
𝐿
𝑎8
+ 𝑏8
= 𝑐8
1 + 𝑥 $
= 1 +
𝑛𝑥
1!
+
𝑛 𝑛 − 1 𝑥8
2!
+ ⋯
𝑥 =
−𝑏 ± 𝑏8 − 4𝑎𝑐
2𝑎
sin 𝛼 ± sin 𝛽 = 2 sin
1
2
𝛼 ± 𝛽 cos
1
2
𝛼 ∓ 𝛽
𝐸 = 𝑀𝐶8
Alphabet Soup
Vocabularies
http://flic.kr/p/zCyMp
Standards
Organising Principles
http://flic.kr/p/kRqh42
SKOS
DC
W3C
schema.org
Linked Data
RDF
HL7/FHIR
ISA2
MeSH UMLS SnoMed CT IDC11
INFORMATION ARCHITECTURE AS AN ORGANIZING
DISCIPLINE
the discipline of organising
1. What Is Being Organised?
2. Why Is It Being Organised?
3. How Much Is It Being Organised?
4. When Is It Being Organised?
5. Who (or What) is Organising It?
6. Where is it Organised?
16
Robert Glushko
INFORMATION ARCHITECTURE AS AN
ORGANIZING DISCIPLINE
● Libraries, markets,
museums, zoos,
vineyards
● Different types of
data and documents
● Personal
information and
artifacts
● People
We
Organise
17
Robert Glushko
INFORMATION ARCHITECTURE AS AN ORGANIZING
DISCIPLINE
An “Organising System”
Robert Glushko 18
INFORMATION ARCHITECTURE AS AN ORGANIZING
DISCIPLINE
Organising
Books By
Content
Photo by Jeffrey Beall (http://www.flickr.com/photos/denverjeffrey/304220561) Creative Commons CC BY-ND 2.0
Robert Glushko 19
INFORMATION ARCHITECTURE AS AN ORGANIZING
DISCIPLINE
Recall:
The “Organising System”
A collection of resources intentionally arranged
to enable some set of interactions
Robert Glushko 20
INFORMATION ARCHITECTURE AS AN ORGANIZING
DISCIPLINE
“Information Architecture is designing an
abstract and effective organisation
of information
and then
exposing that organisation to facilitate
navigation and information use”
Intentionalarrangement
Interactionsupport
Robert Glushko 21
Defining “Information Architecture” as
an Organising Discipline
Standing on the
shoulders of Giants
No AI without IA
Newton ShannonBayes TuringAl Kindi Ada Wolfram
INFORMATION ARCHITECTURE AS AN ORGANIZING
DISCIPLINE
The Document Type Spectrum – A
Continuum Between
Documents and Data
Robert Glushko 23
Tools for a smart platform
Data Findability:
How Knowledge Graphs can
support FAIR data
Google works for us as
consumers, yes?
Weekend search
Frustrated by search at work?
Handy-patient summary?
Latest research?
Applicable guidelines?
Is your data designed to be found/understood?
Metadata:
what is it? what’s inside it?
what are the ingredients?
Context:
how old is it?
when and where was it made?
what is needed to access the contents?
Structure:
does it need to be cooked/processed?
how long does it need to be cooked for?
Data connections:
will it be like anything I’ve had before?
will it go with the other items I have?
will it be something others like?
Data meaning / comprehension (aka Findability)
Google & its Knowledge Graph
Bridging the gap: Data meaning v User intent
Situation for most organisations
Google
Google’s Knowledge Graph 2010
(based on Semantic Web-based technologies, incl. RDF)
A Knowledge Graph: an imagined representation
Nodes = entities, concepts, phrases
Lines/edges = relationships
KG’s underlying tech = Semantic Web-based Technologies
- all Worldwide Web Consortium/W3C standards
Base model = RDF (Resource Description Framework)
RDF triples
/predicates
Subject Object
predicate
Value
predicate
Patient X Loss of energy
hasSymptom
38.0°ChasTemperature
Every concept and relationship has its
own identifier (URI) – less ambiguity
PageRank – known Webpage connections
Pre 2010
Keyword matching
Page Rank
50% space given to organic
search results
Connected Data – known relationships
Match data meaning with query intent
Findable in FAIR
+ Context awareness
(Location, Time of day, Device etc)
+ UX becomes more intuitive to reflect
possible query/intent
Search Results and Discovery
Information Boxes (auto publishing)
Q & As
Searches related to “your query”
10% space given to organic search results
Google’s Knowledge Graph, 2010
Encoded knowledge
Machine readable
Human readable
Entity extraction
Known entities + known
relationships with other entities
(connected data)
People Organisations Places Events Products Services
meaning +
aboutness
known entities: Watch (product), sold where, by whom, price etc
Q & A
Information box
Online sellers
Organic search results
Ads
Retail sellers
Related searches
Location,
Device,
Time of Day
known entities: Watch (product), sold where, by whom, price etc
Q & A
Information box
Online sellers
Organic search results
Ads
Retail sellers
Related searches
Location,
Device,
Time of Day
30% of websites now help
Google with the meaning of
their websites by adding
(RDF) Schema.org and
microformats
Includes for populating:
Q&As, Info Box, rich snippets
etc
known entities: Virus, infectious agent, diseases caused, related symptoms
no Ads
Request for feedback
Top stories
Help & information
Safety tips
Organic search results
Trusted sources
Organisations can use KGs to make
data more findable too
(web-base applications)
Make data smarter by Semantic Annotation
Tabular data
Metadata “B”
Info model Z
RDBMs
Metadata ”A”
Info model Y
Archived dataDocuments
RDF
1. Data connectors – different data sources and formats
2. NLP (NER) & ML: processing & enrichment pipeline
3. Standardized metadata
4. Data semantically annotated (consistently) with concepts +
relationships from the KG
5. Semantic layer over all data sources: Interoperable (FAIR)
Elastic
KG
Make search smarter: analyse user behaviour
Tabular data
Metadata “B”
Info model Z
RDBMs
Metadata ”A”
Info model Y
PDFsDocuments
RDF
1. Search & click log analysis
2. ML: Learning to Rank
3. Detect new concepts for the KG
Kibana
How do you make a Knowledge Graph?
2-D elements of a 3-D Knowledge Graph
Polyhierarchical Taxonomy / Thesaurus Ontology
Main domain class relationships
“Business rules”
Information model
1. Synonyms, acronyms, abbreviations
2. Language from different perspectives (not just keyword match)
3. Use standard terminologies /pick&mix
4. Different spoken languages
5. Cultural accessibility & interoperability (FAIR)
6. UX typing = autocomplete
7. UX navigation - polyhierarchical
hasSymptom
hasSymptom
Each concept and relationship has a unique URI –
helps prevent ambiguity when searching:
Cold (common cold)
Cold (temperature, feeling)
Semantic Annotation: Content v Datasets
• Content (unstructured text) holds more concepts with which to
refer to the knowledge graph
• Exploit 80% of unstructured text in EHRs
• Connect & index various sources through one search interface – see
the latest research and guidelines for the treatment of a disease or
condition
• Patient summary
• Datasets/Databases have fewer concepts, often unclear field names
(reference or shortened text)
• Harder to find data from keyword matching alone
• Knowledge graphs can add a higher level meaning to datasets, if not
just the metadata is indexed
• E.g. “Hypertension” – all datasets with a systolic blood
pressure over 160mmHg
Different types of metadata
Concrete Abstract, fuzzy
Extracted system data:
File size, Date modified etc
Extracted from system
Extracted measurement value data:
E.g. Weight (kg), Blood pressure
Extracted from text
Specific identifiers data:
E.g. email, mobile number
Extracted from text
Named real-world entities
e.g. People names, Organisations
Types & Categories
e.g. roles, disciplines
Events & Actions
e.g. birth date,
appointment, conference
Characteristics & attributes
e.g. colour, size
Entity relationships, connections
e.g. disease symptoms, causes
Topics / Subjects
e.g. Paediatrics,
Surgery
Sentiment, opinion
e.g. positive, negative
Knowledge graphs can help with more complex searches & navigation
Creating a DCAT metadata catalogues
(RDF base & W3C recommended)
Adding metadata for datasets (with RDF)
Save/Upload pipeline
System metadata
Final_final.csv
Metadata form using metadata standards in
the background e.g. Dublin Core
Description Processing & indexing pipeline
Lineage,
versioning
Upload
Annotations,
categorisation etc
Usage notation
Form populated:
Automatically
Semi-automatically
Manually
Or a combination
User rating
Auto-populated by
referencing the KG Let users determine Dataset usefulness
Reusable (FAIR)
Comments
User can modify before uploading
Buy versus Build
• Buy
• Quick (initially)
• Lock in?
• Development?
• New ideas, processes, sources?
• Application centric
• Build
• Agile build
• Data centric, easier application innovation
• In control: data capture to end user consumption
• Behaviour & usage measurement / policing
• KGs used across
• Integrate also Data Sensitivity & Access control
• Flexible for new data interoperability changes
e.g. convert to RDF data
• AI (ML & NLP) readyKnowledge
Graphs
AI
Thank you
Nordic health data   metadata
fredric.landqvist@findwise.com
https://www.linkedin.com/in/fredriclandqvist/
Peter.voisey@findwise.com
https://www.linkedin.com/in/petervoisey/

More Related Content

PDF
Semantics and linked data at astra zeneca
PDF
Trustworthy AI and Open Science
PDF
McGeary Data Curation Network: Developing and Scaling
PDF
Data Science and its relationship to Big Data and data-driven decision making
PDF
Public PhD Defense - Ben De Meester
PPT
Analysis of ‘Unstructured’ Data
DOCX
Vocabulary interoperability in the semantic web james r morris
PPTX
Rscd 2017 bo f data lifecycle data skills for libs
Semantics and linked data at astra zeneca
Trustworthy AI and Open Science
McGeary Data Curation Network: Developing and Scaling
Data Science and its relationship to Big Data and data-driven decision making
Public PhD Defense - Ben De Meester
Analysis of ‘Unstructured’ Data
Vocabulary interoperability in the semantic web james r morris
Rscd 2017 bo f data lifecycle data skills for libs

What's hot (20)

PDF
Yahoo's Knowledge Graph - 2014 slides
PPTX
Data science.chapter-1,2,3
PPTX
The Neuroscience Information Framework: A Scalable Platform for Information E...
PPTX
Structured data and metadata evaluation methodology for organizations looking...
PPTX
Elsevier’s Healthcare Knowledge Graph
PPTX
SemTech 2011 Semantic Search tutorial
PPTX
The State of Linked Government Data
PDF
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
PPTX
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
PPT
Web Mining
PPTX
Search strategy Tax 2019
PPTX
PPTX
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
PDF
Lect 1 introduction
PPT
Introduction To Data Mining
PPTX
From Data Search to Data Showcasing
PPTX
Data Analytics
PPTX
ChildBrain/Predictable summer school - Open Science
PPT
Human Genome and Big Data Challenges
Yahoo's Knowledge Graph - 2014 slides
Data science.chapter-1,2,3
The Neuroscience Information Framework: A Scalable Platform for Information E...
Structured data and metadata evaluation methodology for organizations looking...
Elsevier’s Healthcare Knowledge Graph
SemTech 2011 Semantic Search tutorial
The State of Linked Government Data
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Web Mining
Search strategy Tax 2019
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
Lect 1 introduction
Introduction To Data Mining
From Data Search to Data Showcasing
Data Analytics
ChildBrain/Predictable summer school - Open Science
Human Genome and Big Data Challenges
Ad

Similar to Nordic health data metadata (20)

PDF
Big Data for Library Services (2017)
PDF
My FAIR share of the work - Diamond Light Source - Dec 2018
PDF
PDF
Open Science Governance and Regulation/Simon Hodson
PDF
Managing Metadata for Science and Technology Studies: the RISIS case
PDF
The FAIR Principles and FAIRsharing
PPTX
In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate
PDF
INSERM - Data Management & Reuse of Health Data - May 2017
PPTX
One View of Data Science
PPT
Data Science in Biomedicine - Where Are We Headed?
PDF
Jisc visions: research
PDF
Empowering Search Through 3RDi Semantic Enrichment
PDF
Fair by design
PDF
Using Machine Learning to Capture Data Meaning and Wrangle it to Liberate its...
PPTX
Gobinda Chowdhury
PPT
Bioinformatioc: Information Retrieval
PDF
The web of data: how are we doing so far?
PPTX
Real-time applications of Data Science.pptx
PDF
Research Knowledge Graphs at GESIS & NFDI4DataScience
PPTX
The Pistoia Alliance Biology Domain Strategy April 2011
Big Data for Library Services (2017)
My FAIR share of the work - Diamond Light Source - Dec 2018
Open Science Governance and Regulation/Simon Hodson
Managing Metadata for Science and Technology Studies: the RISIS case
The FAIR Principles and FAIRsharing
In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate
INSERM - Data Management & Reuse of Health Data - May 2017
One View of Data Science
Data Science in Biomedicine - Where Are We Headed?
Jisc visions: research
Empowering Search Through 3RDi Semantic Enrichment
Fair by design
Using Machine Learning to Capture Data Meaning and Wrangle it to Liberate its...
Gobinda Chowdhury
Bioinformatioc: Information Retrieval
The web of data: how are we doing so far?
Real-time applications of Data Science.pptx
Research Knowledge Graphs at GESIS & NFDI4DataScience
The Pistoia Alliance Biology Domain Strategy April 2011
Ad

More from Fredric Landqvist (20)

PDF
F.A.I.R. Data with Knowledge Graphs & AI
PDF
Start making sense - sustainable organising principles
PDF
Smart cities no ai without ia
PDF
Modelling the municipality
PDF
Start making sense june 2018
PDF
Digital Workplace, past, present and future
PPTX
Linked Data meet up 2015 in Gothenburg
PPTX
Archive & Governance
PPTX
Content Practices - participation and semantic enhancement
PPTX
Organising principles
PPTX
Webb3.0 intranätverk presentation 20 maj 2014
PDF
Wayfinding and participation
PDF
Linked Data and Citizen Participation - Next Gen of Muncipality Service
PPT
Future learning spaces
KEY
new emerging form of collaboration and social business
PDF
Ict expo collaboration social business
PPT
Etik och Sociala Medier för Läkare
KEY
Itit collaboration social business
PPT
Midwife futures
PPT
Itit e health and collaboration midwifes
F.A.I.R. Data with Knowledge Graphs & AI
Start making sense - sustainable organising principles
Smart cities no ai without ia
Modelling the municipality
Start making sense june 2018
Digital Workplace, past, present and future
Linked Data meet up 2015 in Gothenburg
Archive & Governance
Content Practices - participation and semantic enhancement
Organising principles
Webb3.0 intranätverk presentation 20 maj 2014
Wayfinding and participation
Linked Data and Citizen Participation - Next Gen of Muncipality Service
Future learning spaces
new emerging form of collaboration and social business
Ict expo collaboration social business
Etik och Sociala Medier för Läkare
Itit collaboration social business
Midwife futures
Itit e health and collaboration midwifes

Recently uploaded (20)

PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Computer network topology notes for revision
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Foundation of Data Science unit number two notes
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Galatica Smart Energy Infrastructure Startup Pitch Deck
Supervised vs unsupervised machine learning algorithms
Computer network topology notes for revision
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
climate analysis of Dhaka ,Banglades.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Reliability_Chapter_ presentation 1221.5784
Moving the Public Sector (Government) to a Digital Adoption
Foundation of Data Science unit number two notes
Data_Analytics_and_PowerBI_Presentation.pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
IBA_Chapter_11_Slides_Final_Accessible.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
STUDY DESIGN details- Lt Col Maksud (21).pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm

Nordic health data metadata

  • 1. Start Making Sense Be F.A.I.R Nordic HealthData – how do we utilize metadata in health practice?
  • 2. INSTITUTIONEN FÖR TILLÄMPAD INFORMATIONSTEKNOLOGI | www.ait.gu.seINSTITUTIONEN FÖR TILLÄMPAD INFORMATIONSTEKNOLOGI | www.ait.gu.se Moore humor anno 1965 Reality today 2020 http://orangecone.com/archives/2010/04/smart_things_ch_2.html
  • 3. INSTITUTIONEN FÖR TILLÄMPAD INFORMATIONSTEKNOLOGI | www.ait.gu.seINSTITUTIONEN FÖR TILLÄMPAD INFORMATIONSTEKNOLOGI | www.ait.gu.se It was 20+ years ago today… Dr Pål Lindström Pål, founded: LocusMedicus Int. • A Healthcare Community of Practice • Knowledge Networking automated and augmented by AI
  • 13. Lingua Franca 𝑥 + 𝑎 $ = & '() $ 𝑛 𝑘 𝑥' 𝑎$,' 𝑓 𝑥 = 𝑎) + & $(. / 𝑎$ cos 𝑛𝜋𝑥 𝐿 + 𝑏$ sin 𝑛𝜋𝑥 𝐿 𝑎8 + 𝑏8 = 𝑐8 1 + 𝑥 $ = 1 + 𝑛𝑥 1! + 𝑛 𝑛 − 1 𝑥8 2! + ⋯ 𝑥 = −𝑏 ± 𝑏8 − 4𝑎𝑐 2𝑎 sin 𝛼 ± sin 𝛽 = 2 sin 1 2 𝛼 ± 𝛽 cos 1 2 𝛼 ∓ 𝛽 𝐸 = 𝑀𝐶8
  • 16. INFORMATION ARCHITECTURE AS AN ORGANIZING DISCIPLINE the discipline of organising 1. What Is Being Organised? 2. Why Is It Being Organised? 3. How Much Is It Being Organised? 4. When Is It Being Organised? 5. Who (or What) is Organising It? 6. Where is it Organised? 16 Robert Glushko
  • 17. INFORMATION ARCHITECTURE AS AN ORGANIZING DISCIPLINE ● Libraries, markets, museums, zoos, vineyards ● Different types of data and documents ● Personal information and artifacts ● People We Organise 17 Robert Glushko
  • 18. INFORMATION ARCHITECTURE AS AN ORGANIZING DISCIPLINE An “Organising System” Robert Glushko 18
  • 19. INFORMATION ARCHITECTURE AS AN ORGANIZING DISCIPLINE Organising Books By Content Photo by Jeffrey Beall (http://www.flickr.com/photos/denverjeffrey/304220561) Creative Commons CC BY-ND 2.0 Robert Glushko 19
  • 20. INFORMATION ARCHITECTURE AS AN ORGANIZING DISCIPLINE Recall: The “Organising System” A collection of resources intentionally arranged to enable some set of interactions Robert Glushko 20
  • 21. INFORMATION ARCHITECTURE AS AN ORGANIZING DISCIPLINE “Information Architecture is designing an abstract and effective organisation of information and then exposing that organisation to facilitate navigation and information use” Intentionalarrangement Interactionsupport Robert Glushko 21 Defining “Information Architecture” as an Organising Discipline
  • 22. Standing on the shoulders of Giants No AI without IA Newton ShannonBayes TuringAl Kindi Ada Wolfram
  • 23. INFORMATION ARCHITECTURE AS AN ORGANIZING DISCIPLINE The Document Type Spectrum – A Continuum Between Documents and Data Robert Glushko 23
  • 24. Tools for a smart platform
  • 25. Data Findability: How Knowledge Graphs can support FAIR data
  • 26. Google works for us as consumers, yes? Weekend search
  • 27. Frustrated by search at work? Handy-patient summary? Latest research? Applicable guidelines?
  • 28. Is your data designed to be found/understood? Metadata: what is it? what’s inside it? what are the ingredients? Context: how old is it? when and where was it made? what is needed to access the contents? Structure: does it need to be cooked/processed? how long does it need to be cooked for? Data connections: will it be like anything I’ve had before? will it go with the other items I have? will it be something others like? Data meaning / comprehension (aka Findability)
  • 29. Google & its Knowledge Graph
  • 30. Bridging the gap: Data meaning v User intent Situation for most organisations Google Google’s Knowledge Graph 2010 (based on Semantic Web-based technologies, incl. RDF)
  • 31. A Knowledge Graph: an imagined representation Nodes = entities, concepts, phrases Lines/edges = relationships KG’s underlying tech = Semantic Web-based Technologies - all Worldwide Web Consortium/W3C standards Base model = RDF (Resource Description Framework) RDF triples /predicates Subject Object predicate Value predicate Patient X Loss of energy hasSymptom 38.0°ChasTemperature Every concept and relationship has its own identifier (URI) – less ambiguity
  • 32. PageRank – known Webpage connections Pre 2010 Keyword matching Page Rank 50% space given to organic search results
  • 33. Connected Data – known relationships Match data meaning with query intent Findable in FAIR + Context awareness (Location, Time of day, Device etc) + UX becomes more intuitive to reflect possible query/intent Search Results and Discovery Information Boxes (auto publishing) Q & As Searches related to “your query” 10% space given to organic search results Google’s Knowledge Graph, 2010 Encoded knowledge Machine readable Human readable Entity extraction Known entities + known relationships with other entities (connected data) People Organisations Places Events Products Services meaning + aboutness
  • 34. known entities: Watch (product), sold where, by whom, price etc Q & A Information box Online sellers Organic search results Ads Retail sellers Related searches Location, Device, Time of Day
  • 35. known entities: Watch (product), sold where, by whom, price etc Q & A Information box Online sellers Organic search results Ads Retail sellers Related searches Location, Device, Time of Day 30% of websites now help Google with the meaning of their websites by adding (RDF) Schema.org and microformats Includes for populating: Q&As, Info Box, rich snippets etc
  • 36. known entities: Virus, infectious agent, diseases caused, related symptoms no Ads Request for feedback Top stories Help & information Safety tips Organic search results Trusted sources
  • 37. Organisations can use KGs to make data more findable too (web-base applications)
  • 38. Make data smarter by Semantic Annotation Tabular data Metadata “B” Info model Z RDBMs Metadata ”A” Info model Y Archived dataDocuments RDF 1. Data connectors – different data sources and formats 2. NLP (NER) & ML: processing & enrichment pipeline 3. Standardized metadata 4. Data semantically annotated (consistently) with concepts + relationships from the KG 5. Semantic layer over all data sources: Interoperable (FAIR) Elastic KG
  • 39. Make search smarter: analyse user behaviour Tabular data Metadata “B” Info model Z RDBMs Metadata ”A” Info model Y PDFsDocuments RDF 1. Search & click log analysis 2. ML: Learning to Rank 3. Detect new concepts for the KG Kibana
  • 40. How do you make a Knowledge Graph?
  • 41. 2-D elements of a 3-D Knowledge Graph Polyhierarchical Taxonomy / Thesaurus Ontology Main domain class relationships “Business rules” Information model 1. Synonyms, acronyms, abbreviations 2. Language from different perspectives (not just keyword match) 3. Use standard terminologies /pick&mix 4. Different spoken languages 5. Cultural accessibility & interoperability (FAIR) 6. UX typing = autocomplete 7. UX navigation - polyhierarchical hasSymptom hasSymptom Each concept and relationship has a unique URI – helps prevent ambiguity when searching: Cold (common cold) Cold (temperature, feeling)
  • 42. Semantic Annotation: Content v Datasets • Content (unstructured text) holds more concepts with which to refer to the knowledge graph • Exploit 80% of unstructured text in EHRs • Connect & index various sources through one search interface – see the latest research and guidelines for the treatment of a disease or condition • Patient summary • Datasets/Databases have fewer concepts, often unclear field names (reference or shortened text) • Harder to find data from keyword matching alone • Knowledge graphs can add a higher level meaning to datasets, if not just the metadata is indexed • E.g. “Hypertension” – all datasets with a systolic blood pressure over 160mmHg
  • 43. Different types of metadata Concrete Abstract, fuzzy Extracted system data: File size, Date modified etc Extracted from system Extracted measurement value data: E.g. Weight (kg), Blood pressure Extracted from text Specific identifiers data: E.g. email, mobile number Extracted from text Named real-world entities e.g. People names, Organisations Types & Categories e.g. roles, disciplines Events & Actions e.g. birth date, appointment, conference Characteristics & attributes e.g. colour, size Entity relationships, connections e.g. disease symptoms, causes Topics / Subjects e.g. Paediatrics, Surgery Sentiment, opinion e.g. positive, negative Knowledge graphs can help with more complex searches & navigation
  • 44. Creating a DCAT metadata catalogues (RDF base & W3C recommended)
  • 45. Adding metadata for datasets (with RDF) Save/Upload pipeline System metadata Final_final.csv Metadata form using metadata standards in the background e.g. Dublin Core Description Processing & indexing pipeline Lineage, versioning Upload Annotations, categorisation etc Usage notation Form populated: Automatically Semi-automatically Manually Or a combination User rating Auto-populated by referencing the KG Let users determine Dataset usefulness Reusable (FAIR) Comments User can modify before uploading
  • 46. Buy versus Build • Buy • Quick (initially) • Lock in? • Development? • New ideas, processes, sources? • Application centric • Build • Agile build • Data centric, easier application innovation • In control: data capture to end user consumption • Behaviour & usage measurement / policing • KGs used across • Integrate also Data Sensitivity & Access control • Flexible for new data interoperability changes e.g. convert to RDF data • AI (ML & NLP) readyKnowledge Graphs AI