SlideShare a Scribd company logo
Un
chem2
bio2rdf
DBpedia
live
URI
Burner
Opencyc
Diseasome
FU-Berlin
DNB
GND
Bio2RDF
NDC
Bio2RDF
Mesh
CKAN
Freebase
Linklion
Organic
Edunet
Biomodels
RDF
Reactome
RDF
Disgenet
IServe
Linked
TCGA
RDF
License
Harvest
RKB
Explorer
Lisbon
Austrian
Ski
Racers
RKB
Explorer
LAAS
RKB
Explorer
Wiki
JISC
RKB
Explorer
Eprints
RKB
Explorer
CurriculumRKB
Explorer
NSF
RKB
Explorer
DBLP
RKB
Explorer
ACM
RKB
Explorer
Southampton
RKB
Explorer
Deepblue
RKB
Explorer
Irit
RKB
Explorer
RAE2001
Geo
nked
Data
Bio2RDF
Ncbigene
Bio2RDF
DBSNP
DBpedia
DBpedia
ES
DBpedia
CS
Alpino
RDF
YAGO
KUPKB
Bio2RDF
Taxon-
concept
Assets
GNU
Licenses
DBpedia
VIVO
University
of Florida
StatusNet
Mrblog
Bio2RDF
Dataset
EUNIS
Uniprot
KB
StatusNet
Timttmy
StatusNet
Somsants
StatusNet
Drugbank
FU-Berlin
StatusNet
Dtdns
StatusNet
Status.net
StatusNet
Fragdev
Morelab
StatusNet
Macno
DBpedia
EU
Bio2RDF
Taxon
Uniprot
Metadata
Linked
Geo
Data
Project
Wiki
Enipedia
Linked
MDB
Sider
FU-Berlin
DBpedia
DE
DBpedia
EL
DBpedia
Lite
Drug
Interaction
Knowledge
Base
StatusNet
Qdnx
Hellenic
ire Brigade
StatusNet
Lydiastench
Taxon-
concept
Occurences
W3C
StatusNet
1w6
Linked
Life
Data
Semantic Web
DogFood
UMBEL
StatusNet
Ssweeny
StatusNet
Quitter StatusNet
Jonkman
StatusNet
Thelovebug
Bio2RDF
Uniprot
Taxonomy
DBpedia
NL
StatusNet
Russwurm
DBpedia
KO
Dailymed
FU-Berlin
DBpedia
IT
Aves3D
LT
StatusNet
Gomertronic
StatusNet
Progval
Testee
DBpedia
JA
StatusNet
Cooleysekula
Product
StatusNet
Postblue
StatusNet
Skilledtests
StatusNet
Fcac
Clean
Energy
Data
Reegle
StatusNet
Legadolibre
Geo
Names
Bio2RDF
GeneID
GNI
Archiveshub
Linked
Data
Code
Haus
Ordnance
Survey
Linked
Data
NUTS
Geo-
vocab
LOD
ACBDLS
FOAF-
Profiles
Net
ble
DBpedia
FR
h
StatusNet
Ourcoffs
StatusNet
Hackerposse
LOV
Bio2RDF
Taxonomy
StatusNet
Morphtown
StatusNet
chromic
Geospecies
linkedct
StatusNet
linuxwrangling
Linked
Open Data
of
Ecology
StatusNet
chickenkiller
Taxon
concept
Functional Manipulation
of Large Data Graphs
David Hyland-Wood
david.wood@ephox.com
@prototypo
1 June 2016
Functional manipulations of large data graphs 20160601
Functional manipulations of large data graphs 20160601
Something
Something
else
a relationship
UQ Universityis a
UQ
The University of
Queensland
label
Universityis a
Group of 8
affiliation
Functional manipulations of large data graphs 20160601
Functional manipulations of large data graphs 20160601
We’ve Seen This Before
Functional manipulations of large data graphs 20160601
08 Oct 2007
Functional manipulations of large data graphs 20160601
Functional manipulations of large data graphs 20160601
Functional manipulations of large data graphs 20160601
Functional manipulations of large data graphs 20160601
Functional manipulations of large data graphs 20160601
Functional manipulations of large data graphs 20160601
Functional manipulations of large data graphs 20160601
The RDF Data Model
• Turtle
• TriG
• N-Triples
• N-Quads
• JSON-LD
• RDFa
• RDF/XML
Standard serialisation
formats:
}Turtle family of
RDF formats
Possibly lossy
alternatives:
• CSV
• ODATA
• etc
$ curl http://dbpedia.org/page/University_of_Queensland
$ curl http://dbpedia.org/data/University_of_Queensland
$ curl http://dbpedia.org/data/University_of_Queensland.n3
> University_of_Queensland.n3
https://en.wikipedia.org/wiki/University_of_Queensland
HTML
RDF in XML (Yuck!)
Many formats, e.g. sane RDF, ODATA, Microdata, JSON…
Functional manipulations of large data graphs 20160601
Functional manipulations of large data graphs 20160601
Functional manipulations of large data graphs 20160601
Functional manipulations of large data graphs 20160601
Functional manipulations of large data graphs 20160601
UQ
The University of
Queensland
label
affiliation
Group of 8
34228
number of undergraduate students
48771
number of students
Functional manipulations of large data graphs 20160601
# G8 universities ordered by the number of students
# at each university.
PREFIX dbo:<http://dbpedia.org/ontology/>
select ?name ?students ?undergrads
where {
?s dbo:affiliation <http://dbpedia.org/resource/
Group_of_Eight_(Australian_universities)> .
?s rdfs:label ?name .
OPTIONAL {?s dbo:numberOfStudents ?students}
OPTIONAL {?s dbo:numberOfUndergraduateStudents ?
undergrads}
FILTER ( lang(?name) = "en" )
} ORDER BY DESC (?students)
# G8 universities ordered by the number of students
# at each university.
PREFIX dbo:<http://dbpedia.org/ontology/>
select ?name ?students ?undergrads
where {
?s dbo:affiliation <http://dbpedia.org/resource/
Group_of_Eight_(Australian_universities)> .
?s rdfs:label ?name .
OPTIONAL {?s dbo:numberOfStudents ?students}
OPTIONAL {?s dbo:numberOfUndergraduateStudents ?
undergrads}
FILTER ( lang(?name) = "en" )
} ORDER BY DESC (?students)
# G8 universities ordered by the number of students
# at each university.
PREFIX dbo:<http://dbpedia.org/ontology/>
select ?name ?students ?undergrads
where {
?s dbo:affiliation <http://dbpedia.org/resource/
Group_of_Eight_(Australian_universities)> .
?s rdfs:label ?name .
OPTIONAL {?s dbo:numberOfStudents ?students}
OPTIONAL {?s dbo:numberOfUndergraduateStudents ?
undergrads}
FILTER ( lang(?name) = "en" )
} ORDER BY DESC (?students)
# G8 universities ordered by the number of students
# at each university.
PREFIX dbo:<http://dbpedia.org/ontology/>
select ?name ?students ?undergrads
where {
?s dbo:affiliation <http://dbpedia.org/resource/
Group_of_Eight_(Australian_universities)> .
?s rdfs:label ?name .
OPTIONAL {?s dbo:numberOfStudents ?students}
OPTIONAL {?s dbo:numberOfUndergraduateStudents ?
undergrads}
FILTER ( lang(?name) = "en" )
} ORDER BY DESC (?students)
# G8 universities ordered by the number of students
# at each university.
PREFIX dbo:<http://dbpedia.org/ontology/>
select ?name ?students ?undergrads
where {
?s dbo:affiliation <http://dbpedia.org/resource/
Group_of_Eight_(Australian_universities)> .
?s rdfs:label ?name .
OPTIONAL {?s dbo:numberOfStudents ?students}
OPTIONAL {?s dbo:numberOfUndergraduateStudents ?
undergrads}
FILTER ( lang(?name) = "en" )
} ORDER BY DESC (?students)
# G8 universities ordered by the number of students
# at each university.
PREFIX dbo:<http://dbpedia.org/ontology/>
select ?name ?students ?undergrads
where {
?s dbo:affiliation <http://dbpedia.org/resource/
Group_of_Eight_(Australian_universities)> .
?s rdfs:label ?name .
OPTIONAL {?s dbo:numberOfStudents ?students}
OPTIONAL {?s dbo:numberOfUndergraduateStudents ?
undergrads}
FILTER ( lang(?name) = "en" )
} ORDER BY DESC (?students)
# G8 universities ordered by the number of students
# at each university.
PREFIX dbo:<http://dbpedia.org/ontology/>
select ?name ?students ?undergrads
where {
?s dbo:affiliation <http://dbpedia.org/resource/
Group_of_Eight_(Australian_universities)> .
?s rdfs:label ?name .
OPTIONAL {?s dbo:numberOfStudents ?students}
OPTIONAL {?s dbo:numberOfUndergraduateStudents ?
undergrads}
FILTER ( lang(?name) = "en" )
} ORDER BY DESC (?students)
# G8 universities ordered by the number of students
# at each university.
PREFIX dbo:<http://dbpedia.org/ontology/>
select ?name ?students ?undergrads
where {
?s dbo:affiliation <http://dbpedia.org/resource/
Group_of_Eight_(Australian_universities)> .
?s rdfs:label ?name .
OPTIONAL {?s dbo:numberOfStudents ?students}
OPTIONAL {?s dbo:numberOfUndergraduateStudents ?
undergrads}
FILTER ( lang(?name) = "en" )
} ORDER BY DESC (?students)
Functional manipulations of large data graphs 20160601
OpenStreetMap
Wikimedia Commons
DBpedia
US EPA RCRA
US EPA FRS
ABT Associates
Functional manipulations of large data graphs 20160601
Functional manipulations of large data graphs 20160601
Functional manipulations of large data graphs 20160601
Functional manipulations of large data graphs 20160601
UQ
The University of
Queensland
label
ANU
Australian National
University
label
Monash
affiliation
UMelbourne
affiliation
UNSW
affiliation
USydney
affiliation
UAdelaide
affiliation
Go8
memberOf
memberOf
memberOf
memberOf
memberOf
memberOf
memberOf
University of
Melbourne
label
Monash
University
label
University of
Adelaide
label
Group of 8
label
University of
Sydney
label
University
of NSW
label
UQ
The University of
Queensland
label
ANU
Australian National
University
label
Monash
affiliation
UMelbourne
affiliation
UNSW
affiliation
USydney
affiliation
UAdelaide
affiliation
Functional manipulations of large data graphs 20160601
Graphs in Scala
val graph: Graph[String, String] =
Graph(vertexRDD, edgeRDD)
// Create a subgraph based on the vertices connected
// by an "affiliation" property.
val affiliationRelatedSubgraph =
graph.subgraph(t => t.attr ==
"http://dbpedia.org/ontology/affiliation")
// Find connected components of affiliationRelatedSubgraph.
val ccGraph =
affiliationRelatedSubgraph.connectedComponents()
Graphs in Scala
// Create a hashmap of componentLists.
affiliationRelatedSubgraph.vertices.leftJoin
(ccGraph.vertices) {
case (id, u, comp) => comp.get
}.foreach { case (id, startingNode) =>
{
if (!(componentLists.contains(startingNode))) {
componentLists(startingNode) = new
ListBuffer[VertexId]
}
componentLists(startingNode) += id
}
}
Graphs in Scala
// Output a report on the connected components.
println("------ connected components in related triples ------
n")
for ((component, componentList) <- componentLists){
if (componentList.size > 1) {
for(c <- componentList) {
println(labelMap(c));
}
println("--------------------------")
}
}
------ connected components in related triples ------
Australian National University
University of Sydney
University of Adelaide
University of New South Wales
--------------------------
The University of Queensland
University of Melbourne
Monash University
--------------------------
Resources
• Slides:
http://w3id.org/people/prototypo/talks/UQ-
DKE-20160601/slides
• Code:
http://w3id.org/people/prototypo/talks/UQ-
DKE-20160601/code
Resources
• Callimachus:
http://callimachusproject.org
• Apache Spark:
http://spark.apache.org
• GraphX Programming Guide:
http://spark.apache.org/docs/latest/graphx-
programming-guide.html
Attributions
• Linking Open Data cloud diagram by
Richard Cyganiak and Anja Jentzsch, used
under a CC license: http://lod-cloud.net/
This work is Copyright © 2015 David Hyland-Wood
It is licensed under the Creative Commons Attribution 3.0 Unported License

Full details at: http://creativecommons.org/licenses/by/3.0/
You are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work
Under the following conditions:
Attribution. You must attribute the work in the manner specified by the
author or licensor (but not in any way that suggests that they endorse
you or your use of the work).
Share Alike. If you alter, transform, or build upon this work, you may
distribute the resulting work only under the same or similar license to this
one.

More Related Content

PDF
An Introduction to SPARQL
PDF
Linking the world with Python and Semantics
PDF
WebTech Tutorial Querying DBPedia
PPTX
RDF Validation Future work and applications
PDF
Querying Linked Data with SPARQL
PPTX
Federated SPARQL Query Processing ISWC2015 Tutorial
PDF
Adaptive Query Processing on RAW Data
PPTX
Federated SPARQL query processing over the Web of Data
An Introduction to SPARQL
Linking the world with Python and Semantics
WebTech Tutorial Querying DBPedia
RDF Validation Future work and applications
Querying Linked Data with SPARQL
Federated SPARQL Query Processing ISWC2015 Tutorial
Adaptive Query Processing on RAW Data
Federated SPARQL query processing over the Web of Data

What's hot (16)

PDF
BioSD Tutorial 2014 Editition
PDF
Genomic Analysis in Scala
PPTX
Federated Query Formulation and Processing Through BioFed
PPTX
Efficient source selection for sparql endpoint federation
PPTX
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
PPTX
Java and SPARQL
PDF
Practical Example of AOP with AspectJ
PDF
XSPARQL CrEDIBLE workshop
PPTX
Validating RDF data: Challenges and perspectives
PDF
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
PPTX
Jena Programming
PPTX
GDG Meets U event - Big data & Wikidata - no lies codelab
PPTX
FedX - Optimization Techniques for Federated Query Processing on Linked Data
PDF
Visualize open data with Plone - eea.daviz PLOG 2013
PDF
Java 8 Stream API (Valdas Zigas)
PPTX
Jug trojmiasto 2014.04.24 tricky stuff in java grammar and javac
BioSD Tutorial 2014 Editition
Genomic Analysis in Scala
Federated Query Formulation and Processing Through BioFed
Efficient source selection for sparql endpoint federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
Java and SPARQL
Practical Example of AOP with AspectJ
XSPARQL CrEDIBLE workshop
Validating RDF data: Challenges and perspectives
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Jena Programming
GDG Meets U event - Big data & Wikidata - no lies codelab
FedX - Optimization Techniques for Federated Query Processing on Linked Data
Visualize open data with Plone - eea.daviz PLOG 2013
Java 8 Stream API (Valdas Zigas)
Jug trojmiasto 2014.04.24 tricky stuff in java grammar and javac
Ad

Viewers also liked (20)

PDF
Introduction to Linked Data: RDF Vocabularies
PPTX
BA Seminar - Operational Warfare
PPTX
brand new houses rush for sale/thru bank or in-house financing/ house and lot...
PDF
Announcements, 6/16/13
PPTX
Ajax - jQuery
PPTX
Location ideas
PPTX
La obesidad
PPTX
Moodboard for music video
PPTX
brand new houses rush for sale/thru bank or in-house financing/ house and lot...
DOCX
review on diesel engine applications of biodiesels from non edible resources...
PPTX
Tussentijdse thesispresentatie 28 01
PDF
Cuestionario del video “conociendo a vigotsky, piaget, ausubel y novak”
PDF
Exellent homeostasie
PPT
Presentasi tugas ii (korosi)
PPSX
Corrosion protection
PPT
Tugas korosi iii
PPT
Mikumi Kids Presentation (2009)
PDF
The State of Financial Presentations 2014 Survey Results
PPTX
5 Secrets to Better Presentation Charts and Graphs
PDF
How to Present Data in PowerPoint
Introduction to Linked Data: RDF Vocabularies
BA Seminar - Operational Warfare
brand new houses rush for sale/thru bank or in-house financing/ house and lot...
Announcements, 6/16/13
Ajax - jQuery
Location ideas
La obesidad
Moodboard for music video
brand new houses rush for sale/thru bank or in-house financing/ house and lot...
review on diesel engine applications of biodiesels from non edible resources...
Tussentijdse thesispresentatie 28 01
Cuestionario del video “conociendo a vigotsky, piaget, ausubel y novak”
Exellent homeostasie
Presentasi tugas ii (korosi)
Corrosion protection
Tugas korosi iii
Mikumi Kids Presentation (2009)
The State of Financial Presentations 2014 Survey Results
5 Secrets to Better Presentation Charts and Graphs
How to Present Data in PowerPoint
Ad

Similar to Functional manipulations of large data graphs 20160601 (20)

ODP
Bio2RDF@BH2010
PPTX
Introduction to SPARQL
PPTX
Introduction to SPARQL
PPTX
A Little SPARQL in your Analytics
PPT
Linked Data in Learning Analytics Tools
PDF
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
PPTX
AGROVOC, AGRIS and the CIARD RING, using RDF vocabularies and technologies f...
PPTX
Presentation at the EMBL-EBI Industry RDF meeting
PDF
SFScon 2020 - Peter Hopfgartner - Open Data de luxe
PDF
Bio2RDF @ W3C HCLS2009
PPT
Querying the Semantic Web with SPARQL
PDF
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
PPTX
Querying Linked Data
ODP
2009 0807 Lod Gmod
PPTX
Semantic Variation Graphs the case for RDF & SPARQL
PDF
PPTX
Triplestore and SPARQL
PDF
Introduction to source{d} Engine and source{d} Lookout
PPT
A hands on overview of the semantic web
PDF
APAN 2014 Bandung E-Culture Working Group Introduction to Linked Data
Bio2RDF@BH2010
Introduction to SPARQL
Introduction to SPARQL
A Little SPARQL in your Analytics
Linked Data in Learning Analytics Tools
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
AGROVOC, AGRIS and the CIARD RING, using RDF vocabularies and technologies f...
Presentation at the EMBL-EBI Industry RDF meeting
SFScon 2020 - Peter Hopfgartner - Open Data de luxe
Bio2RDF @ W3C HCLS2009
Querying the Semantic Web with SPARQL
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Querying Linked Data
2009 0807 Lod Gmod
Semantic Variation Graphs the case for RDF & SPARQL
Triplestore and SPARQL
Introduction to source{d} Engine and source{d} Lookout
A hands on overview of the semantic web
APAN 2014 Bandung E-Culture Working Group Introduction to Linked Data

More from David Wood (20)

PPTX
Internet of Things (IoT) two-factor authentication using blockchain
PPTX
Returning to Online Privacy?
PPTX
Methods for Securing Spacecraft Tasking and Control via an Enterprise Ethereu...
PPTX
BlockSW 2019 Keynote
PDF
Returning to Online Privacy - W3C/ANU Future of the Web Roadshow 20190221
PDF
Privacy in the Smart City
PDF
Controlling Complexities in Software Development
PDF
Privacy Concerns related to Verifiable Claims
PDF
Implementing the Verifiable Claims data model
PDF
So You Wanna be a Startup CTO 20170301
PDF
When Metaphors Kill
PDF
Secularism in Australia
PDF
Meditations on Writing in Paradoxes, Oxymorons, and Pleonasms
PDF
Building a writer's platform with social media
PDF
Summary of the Hero's Journey
PDF
Open by Default
PDF
Lod Then, Now and Next 20110926
PDF
Linked Data ROI 20110426
PDF
Introduction to Linked Data: RDF Vocabularies
PDF
Introduction to Linked Data: RDF Vocabularies
Internet of Things (IoT) two-factor authentication using blockchain
Returning to Online Privacy?
Methods for Securing Spacecraft Tasking and Control via an Enterprise Ethereu...
BlockSW 2019 Keynote
Returning to Online Privacy - W3C/ANU Future of the Web Roadshow 20190221
Privacy in the Smart City
Controlling Complexities in Software Development
Privacy Concerns related to Verifiable Claims
Implementing the Verifiable Claims data model
So You Wanna be a Startup CTO 20170301
When Metaphors Kill
Secularism in Australia
Meditations on Writing in Paradoxes, Oxymorons, and Pleonasms
Building a writer's platform with social media
Summary of the Hero's Journey
Open by Default
Lod Then, Now and Next 20110926
Linked Data ROI 20110426
Introduction to Linked Data: RDF Vocabularies
Introduction to Linked Data: RDF Vocabularies

Recently uploaded (20)

PDF
Insiders guide to clinical Medicine.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
master seminar digital applications in india
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Sports Quiz easy sports quiz sports quiz
PDF
Basic Mud Logging Guide for educational purpose
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Classroom Observation Tools for Teachers
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Cell Structure & Organelles in detailed.
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Insiders guide to clinical Medicine.pdf
GDM (1) (1).pptx small presentation for students
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
master seminar digital applications in india
TR - Agricultural Crops Production NC III.pdf
Complications of Minimal Access Surgery at WLH
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
O5-L3 Freight Transport Ops (International) V1.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Sports Quiz easy sports quiz sports quiz
Basic Mud Logging Guide for educational purpose
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Abdominal Access Techniques with Prof. Dr. R K Mishra
Classroom Observation Tools for Teachers
O7-L3 Supply Chain Operations - ICLT Program
VCE English Exam - Section C Student Revision Booklet
Cell Structure & Organelles in detailed.
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF

Functional manipulations of large data graphs 20160601