SlideShare a Scribd company logo
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing
Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal!
?x	
  
dbp:producer	
  dbr:	
  
Bad_Hair	
  
Motivation (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
2	
  
Motivation (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
Due to the semi-structured nature of RDF,
incomplete values cannot be easily detected. !
3	
  
Motivation (2)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SELECT	
  DISTINCT	
  ?movie	
  WHERE	
  {	
  
	
  ?movie	
  rdf:type	
  schema.org:Movie	
  .	
  
	
  ?movie	
  dbp:producer	
  ?producer	
  .	
  
	
  ?movie	
  dct:subject	
  dbc:Universal_Pictures_film	
  .	
  
	
  ?movie	
  dct:subject	
  dbc:Films_shot_in_New_York_City	
  .	
  
}	
   	
   	
  	
  
Retrieve	
  movies	
  that	
  have	
  producers	
  and	
  have	
  been	
  filmed	
  in	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
New	
  York	
  City	
  by	
  Universal	
  Pictures.	
  	
  
39 movies!
(v. 2015-04)!
4	
  
Motivation (2)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SELECT	
  DISTINCT	
  ?movie	
  WHERE	
  {	
  
	
  ?movie	
  rdf:type	
  schema.org:Movie	
  .	
  
	
  ?movie	
  dbp:producer	
  ?producer	
  .	
  
	
  ?movie	
  dct:subject	
  dbc:Universal_Pictures_film	
  .	
  
	
  ?movie	
  dct:subject	
  dbc:Films_shot_in_New_York_City	
  .	
  
}	
   	
   	
  	
  
46 movies!
(There are 7 movies
without producers)!
Retrieve	
  movies	
  that	
  have	
  producers	
  and	
  have	
  been	
  filmed	
  in	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
New	
  York	
  City	
  by	
  Universal	
  Pictures.	
  	
  
5	
  
(v. 2015-04)!
Motivation
Movies (shot in NYC by Universal Pictures) with no producers in!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
All images licensed under Fair use via Wikipedia.!
dbr:Legal_Eagles	

6	
  
dbr:Wanderlust	

 dbr:Barney’s_	

Version_(film)	

dbr:Non_Stop_	

(film)	

dbr:The_Wolf_of_Wall_
Street_(2013_film)	

dbr:Broadway_Love	

 dbr:Trainwreck_(film)	

(v. 2015-04)!
Leonardo
DiCaprio is
a producer!
[[(?movie, dbp:producer, ?producer)]]D [[(?movie, dbp:producer, ?producer)]]D*
Problem Definition
Given an RDF data set D and a SPARQL query Q against
D. Consider D* the virtual data set that contains all the data
that should be in D. !
!
P1) Identifying portions of Q that yield missing values
!
P2) Resolving missing values
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
⊂
µ={movieàdbr:The_Wolf_of_Wall_Street_(2013)_film, produceràdbr:Leonardo_DiCaprio}
[[(?movie, dbp:producer, ?producer)]]D ∧∉
µ={movieàdbr:The_Wolf_of_Wall_Street_(2013)_film, produceràdbr:Leonardo_DiCaprio}
[[(?movie, dbp:producer, ?producer)]]D*∈
7	
  
Does not belong to DBpedia!
Should belong to DBpedia!
OUR APPROACH: HARE
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
8	
  
HARE
•  A hybrid machine/human SPARQL query engine that
is able to enhance the size of query answers. !
•  Based on a novel RDF completeness model, HARE
implements query optimization and execution techniques:!
P1) Identifying portions of queries that yield missing values.
•  HARE resorts to microtask crowdsourcing:!
P2) Resolving missing values.
!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
9	
  
HARE Architecture
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!
10	
  
HARE Architecture
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!
11	
  
RDF Completeness Model (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
dbr:!
Eric_Fellner!
dbr:!
Tim_Bevan!
dbr:!
Kevin_Misher!
dbp:producer!rdf:type!
rdf:type!
schema.org:!
Movie!
rdf:type!
dbr:!
Bad_Hair!
?!
?!
dbp:producer!
dbp:producer!
Movies have producers (e.g. db:The_Interpreter).!
dbr:!
Tower_Heist!
dbr:!
The_Interpreter!
…	
  
12	
  
RDF Completeness Model (2)
①  Predicate multiplicity of an RDF resource!
Number of different objects that a resource has for a certain predicate.!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
MD(dbr:The_Interpreter | dbp:producer) = 3
dbr:!
Eric_Fellner!
dbr:!
Tim_Bevan!
dbr:!
Kevin_Misher!
dbp:producer!
dbr:!
The_Interpreter!
13	
  
RDF Completeness Model (3)
②  Aggregated predicate multiplicity of a class!
Given a predicate, median number of distinct objects that have all the
resources that belong to a class. !
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
AMD(schema.org:Movies | dbp:producer) = 3
MD(dbr:The_Interpreter | dbp:producer) = 3
MD(dbr:Legal_Eagles | dbp:producer) = 2
14	
  
RDF Completeness Model (4)
③  Completeness of an RDF resource
(with respect to a predicate)!
Given a predicate, the completeness of an RDF resource is determined
by the aggregated predicate multiplicity of the classes that it belongs to.!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
CompD(dbr:The_Interpreter | dbp:producer) =
CompD(dbr:Legal_Eagles | dbp:producer) =
CompD(dbr:Bad_Hair) | dbp:producer) =
3
3
2
3
0
3
① 	
  	
  Computed in !
Computed in !② 	
  	
  
15	
  
HARE Architecture
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!
16	
  
Crowd Knowledge
•  The knowledge collected from the crowd is captured in
three knowledge bases:!
•  CKB+, CKB–, CKB~ are fuzzy sets over RDF data
composed of 4-tuples of the form:!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
CKB = ( , , )
CKB+! CKB–! CKB~!
(subject, predicate, object, membership_degree)
RDF triple
17	
  
Types of Crowd Knowledge Bases!
Crowd Knowledge
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
(dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!
“Brian Grazer is a producer of Tower Heist.”!
(dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)!
“Tower Heist does not have a producer.”!
(dbr:Tower_Heist, dbp:producer, _:o1, 0.05)!
“I am not sure if Bad Hair has a producer.”!
CKB+!
CKB-!
CKB~!
18	
  
Types of Crowd Knowledge Bases!
Crowd Knowledge
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
(dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!
“Brian Grazer is a producer of Tower Heist.”!
(dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)!
“Tower Heist does not have a producer.”!
(dbr:Tower_Heist, dbp:producer, _:o1, 0.05)!
“I am not sure if Bad Hair has a producer.”!
CKB+!
CKB-!
CKB~!
Contradiction"
Uncertainty!
19	
  
Measuring Contradiction!
!
•  Contradiction occurs when triples with the same subject
and predicate belong to CKB+ and CKB–.!
•  It is measured as follows:!
•  Contradiction values close to 0.0 indicate high consensus.!
!
Contradiction(dbr:Tower_Heist | dbp:producer) = 1 - | 0.9 – 0.05 | !
= 0.15!
Crowd Knowledge
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
(dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)!
(dbr:Tower_Heist, dbp:producer, _:o1, 0.05)!
CKB+!
CKB–!
20	
  
Measuring Uncertainty!
!
•  When a triple belongs to CKB~, the value of the triple
object is unknown or uncertain.!
!
•  Uncertainty is measured as follows:!
•  Uncertainty values close to 1.0 indicate that the crowd has
shown to be unknowledgeable about the fact to be vetted.!
!
Uncertainty(dbr:Bad_Hair| dbp:producer) = avg({0.78})!
= 0.78!
Crowd Knowledge
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
(dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!
CKB~!
21	
  
HARE Architecture
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!
22	
  
Query Optimizer (1)
•  Heuristic-based optimizer that decomposes the BGPs of
a SPARQL query into two subsets:!
–  SQD: triples patterns executed against the data set D,"
–  SQCROWD: triple patterns to be crowdsourced.!
!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
23	
  
Query Optimizer (2)
•  Given a SPARQL query Q:!
–  Triple patterns in Q with variables in the subject position
and object position are added to SQCROWD.!
–  The rest of the triple patterns in Q are added to to SQD.!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SELECT	
  DISTINCT	
  ?movie	
  WHERE	
  {	
  
	
  ?movie	
  rdf:type	
  schema.org:Movie	
  .	
  
	
  ?movie	
  dbp:producer	
  ?producer	
  .	
  
	
  ?movie	
  dct:subject	
  dbc:Universal_Pictures_film	
  .	
  
	
  ?movie	
  dct:subject	
  dbxFilms_shot_in_New_York_City	
  .	
  
}	
   	
   	
  	
  
t1	
  
t2	
  
t3	
  
t4	
  
SQCROWD	
  
SQD	
  
SQD	
  
SQD	
  
24	
  
•  The optimizer builds a query plan TQ for query Q.!
•  Triple patterns from SQD are grouped into star-shaped
sub-queries in a bushy tree [Vidal et al.].!
•  Triple patterns in SQCROWD are added to the plan TQ in a
left-linear fashion.!
!
!
Query Optimizer (3)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
t1	
   t3	
  
t4	
  
t2	
  
SQD	
  
SQCROWD	
  
25	
  
Query Engine (1)
•  Executes the query plan TQ.!
•  Sub-queries that are part of SQD are executed against
the data set:!
•  For each mapping contained in Ω, the engine instantiates
the triple patterns in SQCROWD.!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
t1	
   t3	
  
t4	
  
SQD	
  
Ω = {{movieà dbr:Tower_Heist},	

{movieà dbr:Legal_Eagles},	

…}	

26	
  
Query Engine (2)
Example of an Iteration !
•  The engine processes {movieà dbr:Tower_Heist}. !
•  Following the running example:!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
Comp (dbr:Tower_Heist) | dbp:producer) = = 0.33
1
3
Contradiction (dbr:Tower_Heist) | dbp:producer) = 0.15
Uncertainty(dbr:Tower_Heist) | dbp:producer) = 0.0
27	
  
(dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)!
(dbr:Tower_Heist, dbp:producer, _:o1, 0.05)!
CKB+!
CKB–!
(dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!CKB~!
Query Engine (3)
Example of an Iteration !
•  The algorithm computes the probability of crowdsourcing
the triple pattern (dbr:Tower_Heist, dbp:producer, ?producer):!
•  α is a score weight between 0.0 and 1.0 (in example 0.5)!
•  If P(CROWD | μ(s), p) is greater than a user threshold τ,
then algorithm crowdsources the triple pattern (μ(s), p, o).!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
P(CROWD | μ(s), p) =	

	

α (1 – 0.33) + (1 – α) min{0.15, 1 – 0.0} = 0.41	

Estimated
incompleteness
Crowd
reliability
28	
  
•  The engine combines mappings obtained from the data
set D and mappings from the crowd stored in CKB+.!
•  The query evaluation terminates when all the sub-
queries are executed. !
Query Engine (4)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
The HARE query engine does not increase the
time complexity of executing a SPARQL query.!
(Theorem 1)
29	
  
HARE Architecture
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!
30	
  
Microtask Manager (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
• Receives triple patterns to
crowdsource, for example:!
• Creates human tasks.!
!
• Submits tasks to the
crowdsourcing platform.!
(dbr:Tower_Heist, dbp:producer, ?p)
31	
  
Microtask Manager (2)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
dbr:Tower_Heist, rdfs:label,
dbp:producer, rdfs:label,
dbr:Tower_Heist, foaf:depiction,
dbr:Tower_Heist, dbo:abstract,
dbr:Tower_Heis, foaf:primaryTopic,
HARE exploits the semantics
encoded in RDF resources!
32	
  
Microtask Manager (3)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
33	
  
CKB+! CKB-! CKB~!
EXPERIMENTAL STUDY
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
34	
  
•  Benchmark: 50 queries against (v. 2014).!
–  Ten queries in different knowledge domains: !
History, Life Sciences, Movies, Music, and Sports.!
•  Implementation details:!
–  HARE is implemented in Python 2.7.6.!
–  CrowdFlower is used as crowdsourcing platform.!
•  Crowdsourcing configuration:!
–  Four different RDF triples per task, 0.07 US$ per task.!
–  At least three judgments were collected per task.!
•  Total RDF triple patterns crowdsourced: 502!
•  Total answers collected from the crowd: 1,609!
Experimental Set-Up
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
35	
  
Results: Size of Query Answer (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
0
5
10
15
20
25
30
35
40
45
Q1 Q2 Q5 Q6 Q3 Q4 Q10 Q8 Q9 Q7
#Answers
Queries
Crowd Answers
Data Set Answers
Sports!
0
10
20
30
40
50
60
70
80
Q4 Q2 Q3 Q1 Q5 Q4 Q7 Q8 Q9 Q10
#Answers
Queries
Crowd Answers
Data Set Answers
Music! Life Sciences!
0
20
40
60
80
100
120
140
160
180
Q2 Q4 Q1 Q3 Q5 Q8 Q7 Q9 Q6 Q10
#Answers
Queries
Crowd Answers
Data Set Answers
1.25 – 2.00! 1.50 – 2.00! 1.08 – 1.92!
HARE identifies sub-queries that produce incomplete answers.
Crowdsourcing is a feasible solution to resolve missing values. !
36	
  
Metric: Number of answers when queries are executed.!
Results: Size of Query Answer (2)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
0
100
200
300
400
500
Q1 Q2 Q3 Q5 Q6 Q4 Q7 Q8 Q10 Q9
#Answers Queries
Crowd Answers
Data Set Answers
0
20
40
60
80
100
120
140
160
Q8 Q3 Q7 Q6 Q5 Q4 Q1 Q2 Q9 Q10
#Answers
Queries
Crowd Answers
Data Set Answers
Movies! History!
1.05 – 3.13! 1.10 – 1.89!
HARE identifies sub-queries that produce incomplete answers.
Crowdsourcing is a feasible solution to resolve missing values. !
37	
  
Metric: Number of answers when queries are executed.!
Metric: Elapsed time since the first task until the last answer is retrieved.!
Results: Crowd Response Time (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60
Q1 Q2
Q3 Q4
Q5 Q6
Q7 Q8
Q9 Q10
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90100
Q1 Q2
Q3 Q4
Q5 Q6
Q7 Q8
Q9 Q10
Judgmentscompleted(%)!
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60
Time (min)
Q1 Q2
Q3 Q4
Q5 Q6
Q7 Q8
Q9 Q10
Sports! Music! Life Sciences!
(12th min.): 77%!
Time (min)Time (min)
(12th min.): 82%! (12th min.): 97%!
At the 12th minute after the first task is submitted
the crowd produces at least 75% of the answers.!
38	
  
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60
Q1 Q2
Q3 Q4
Q5 Q6
Q7 Q8
Q9 Q10
Results: Crowd Response Time (2)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
Judgmentscompleted(%)!
Movies! History!
(12th min.): 98%!
Time (min)
(12th min.): 75%!
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Q1 Q2
Q3 Q4
Q5 Q6
Q7 Q8
Q9 Q10
Time (min)
At the 12th minute after the first task is submitted
the crowd produces at least 75% of the answers.!
39	
  
Metric: Elapsed time since the first task until the last answer is retrieved.!
Metric: A true positive is a mapping that belongs to the query answer.!
Sports Music
Life
Sciences Movies History
Q1 1.00 1.00 0.67 0.88 1.00
Q2 1.00 1.00 1.00 0.96 1.00
Q3 1.00 1.00 0.89 0.79 0.67
Q4 0.55 0.67 1.00 1.00 0.96
Q5 0.86 0.67 1.00 1.00 0.95
Q6 0.69 0.83 1.00 1.00 0.96
Q7 1.00 0.63 0.71 1.00 0.57
Q8 1.00 0.67 0.88 0.94 0.72
Q9 0.46 0.73 1.00 1.00 0.64
Q10 0.92 0.49 1.00 1.00 0.95
Avg 0.85 0.77 0.91 0.96 0.84
Results: Quality of Crowd Answers
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
Sports Music
Life
Sciences Movies History
Q1 1.00 1.00 1.00 0.47 1.00
Q2 1.00 0.29 1.00 1.00 1.00
Q3 1.00 1.00 1.00 1.00 1.00
Q4 0.83 1.00 1.00 1.00 1.00
Q5 1.00 0.86 1.00 1.00 1.00
Q6 1.00 1.00 1.00 1.00 0.96
Q7 1.00 1.00 1.00 1.00 0.84
Q8 1.00 1.00 1.00 1.00 0.78
Q9 1.00 1.00 1.00 1.00 0.92
Q10 1.00 1.00 1.00 1.00 0.98
Avg 0.98 0.91 1.00 0.95 0.95
Recall! Precision!
The crowd exhibits heterogeneous performance within domains.
This supports the importance of HARE triple-based approach.!
40	
  
RELATED WORK
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
41	
  
Human/computer query processing architectures!
Summary of Related Work
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
Manual
specification
Automatically
HARE
CrowdDB [Franklin et al.]: Tables, columns
Deco [Park and Widom]: Rules
Qurk [Marcus et al.]: Microtask I/O
HARE relies on the RDF graph and crowd
knowledge to resort to crowdsourcing !
Crowdsourcing
42	
  
Crowdsourcing in other contexts of Data Management
(SPARQL- or RDF-based)
Summary of Related Work
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
HARE
OASSIS
[Amsterdamer et al.]
KATARA
[Chu et al.]
SPARQL
Query Processing
Tabular Data
Cleansing
Recommendation
System
Mines crowdsourced
patterns specified in a
SPARQL-like language
Compares tabular data
against RDF data sets via
crowdsourced mappings
Resorts to crowdsourcing
to complete missing
values in RDF data sets
43	
  
CONCLUSIONS &
FUTURE WORK
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
44	
  
Conclusions
•  HARE: Hybrid query engine against RDF data sets.!
•  Supports microtasks to enhance query answers on-the-fly.!
!
!
•  Experimental results confirmed that:!
!
!
Future work
•  Study further approaches to capture crowd reliability.!
•  Consider other quality dimensions on the knowledge collected
from the crowd.!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
3.13 times!
Size of query answer!
Crowd response time!
(12th min.): 98%!
Accuracy!
0.84 – 0.96!
45	
  
References
•  [Amsterdamer et al.] Y. Amsterdamer, S. B. Davidson, T. Milo, S.
Novgorodov, and A. Somech. OASSIS: query driven crowd mining. In
SIGMOD, pages 589–600, 2014. !
•  [Chu et al.] X. Chu, J. Morcos, I. F. Ilyas, M. Ouzzani, P. Papotti, N. Tang,
and Y. Ye. Katara: A data cleaning system powered by knowledge bases
and crowdsourcing. In SIGMOD, pages 1247–1261, 2015. !
•  [Marcus et al.] A. Marcus, D. R. Karger, S. Madden, R. Miller, and S. Oh.
Counting with the crowd. PVLDB, 6(2):109–120, 2012. !
•  [Park and Widom] H. Park and J.Widom. Query optimization over
crowdsourced data. PVLDB, 6(10):781–792, 2013. !
•  [Vidal et al.] M.E. Vidal, E. Ruckhaus, T. Lampo, A. Martínez, J. Sierra, and
A. Polleres. Efficiently joining group patterns in SPARQL queries. In ESWC,
pages 228–242, 2010. !
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
46	
  
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing
Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!

More Related Content

PPTX
SPARQL Cheat Sheet
PDF
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
PPTX
The Semantic Web #9 - Web Ontology Language (OWL)
PDF
البيانات المترابطة في المكتبات / ترجمة محمد عبدالحميد معوض
PDF
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
PDF
SPARQL 사용법
PDF
Text categorization with Lucene and Solr
PPTX
SPARQL-DL - Theory & Practice
SPARQL Cheat Sheet
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
The Semantic Web #9 - Web Ontology Language (OWL)
البيانات المترابطة في المكتبات / ترجمة محمد عبدالحميد معوض
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
SPARQL 사용법
Text categorization with Lucene and Solr
SPARQL-DL - Theory & Practice

Similar to HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing (20)

PPTX
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
PDF
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
PDF
final_copy_camera_ready_paper (7)
PPTX
What;s Coming In SPARQL2?
PDF
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
PDF
Tackling Usability Challenges in Querying Massive, Ultra-heterogeneous Graphs
PPTX
Strategies for Processing and Explaining Distributed Queries on Linked Data
PDF
LDQL: A Query Language for the Web of Linked Data
PDF
Advanced queries on the Infinispan Data Grid
PDF
Querying Linked Data and Büchi automata
PPTX
SPARQL Querying Benchmarks ISWC2016
PDF
SERENE 2014 School: Incremental Model Queries over the Cloud
PDF
SERENE 2014 School: Daniel varro serene2014_school
PPTX
Democratizing Big Semantic Data management
PPTX
GDG Meets U event - Big data & Wikidata - no lies codelab
PPT
SPARQL in the Semantic Web
PPTX
A Machine Learning Approach to SPARQL Query Performance Prediction
PDF
PPT
PPTX
SPARQL 1.1 Status
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
final_copy_camera_ready_paper (7)
What;s Coming In SPARQL2?
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tackling Usability Challenges in Querying Massive, Ultra-heterogeneous Graphs
Strategies for Processing and Explaining Distributed Queries on Linked Data
LDQL: A Query Language for the Web of Linked Data
Advanced queries on the Infinispan Data Grid
Querying Linked Data and Büchi automata
SPARQL Querying Benchmarks ISWC2016
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Daniel varro serene2014_school
Democratizing Big Semantic Data management
GDG Meets U event - Big data & Wikidata - no lies codelab
SPARQL in the Semantic Web
A Machine Learning Approach to SPARQL Query Performance Prediction
SPARQL 1.1 Status
Ad

More from Maribel Acosta Deibe (6)

PDF
A Closer Look at the Changing Dynamics of DBpedia Mappings
PDF
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
PDF
Adaptive Semantic Data Management Techniques for Federations of Endpoints
PPTX
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
PPTX
Crowdsourcing Linked Data Quality Assessment
PPTX
Semantic Data Management in Graph Databases
A Closer Look at the Changing Dynamics of DBpedia Mappings
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Crowdsourcing Linked Data Quality Assessment
Semantic Data Management in Graph Databases
Ad

Recently uploaded (20)

PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PDF
An interstellar mission to test astrophysical black holes
PPTX
Pharmacology of Autonomic nervous system
PPTX
2Systematics of Living Organisms t-.pptx
PPT
protein biochemistry.ppt for university classes
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Microbiology with diagram medical studies .pptx
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPTX
2. Earth - The Living Planet earth and life
PPTX
BIOMOLECULES PPT........................
PPTX
famous lake in india and its disturibution and importance
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
An interstellar mission to test astrophysical black holes
Pharmacology of Autonomic nervous system
2Systematics of Living Organisms t-.pptx
protein biochemistry.ppt for university classes
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Biophysics 2.pdffffffffffffffffffffffffff
Microbiology with diagram medical studies .pptx
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
lecture 2026 of Sjogren's syndrome l .pdf
ECG_Course_Presentation د.محمد صقران ppt
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
2. Earth - The Living Planet earth and life
BIOMOLECULES PPT........................
famous lake in india and its disturibution and importance
Placing the Near-Earth Object Impact Probability in Context
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Classification Systems_TAXONOMY_SCIENCE8.pptx

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing

  • 1. HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal! ?x   dbp:producer  dbr:   Bad_Hair  
  • 2. Motivation (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 2  
  • 3. Motivation (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! Due to the semi-structured nature of RDF, incomplete values cannot be easily detected. ! 3  
  • 4. Motivation (2) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SELECT  DISTINCT  ?movie  WHERE  {    ?movie  rdf:type  schema.org:Movie  .    ?movie  dbp:producer  ?producer  .    ?movie  dct:subject  dbc:Universal_Pictures_film  .    ?movie  dct:subject  dbc:Films_shot_in_New_York_City  .   }         Retrieve  movies  that  have  producers  and  have  been  filmed  in                         New  York  City  by  Universal  Pictures.     39 movies! (v. 2015-04)! 4  
  • 5. Motivation (2) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SELECT  DISTINCT  ?movie  WHERE  {    ?movie  rdf:type  schema.org:Movie  .    ?movie  dbp:producer  ?producer  .    ?movie  dct:subject  dbc:Universal_Pictures_film  .    ?movie  dct:subject  dbc:Films_shot_in_New_York_City  .   }         46 movies! (There are 7 movies without producers)! Retrieve  movies  that  have  producers  and  have  been  filmed  in                         New  York  City  by  Universal  Pictures.     5   (v. 2015-04)!
  • 6. Motivation Movies (shot in NYC by Universal Pictures) with no producers in! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! All images licensed under Fair use via Wikipedia.! dbr:Legal_Eagles 6   dbr:Wanderlust dbr:Barney’s_ Version_(film) dbr:Non_Stop_ (film) dbr:The_Wolf_of_Wall_ Street_(2013_film) dbr:Broadway_Love dbr:Trainwreck_(film) (v. 2015-04)! Leonardo DiCaprio is a producer!
  • 7. [[(?movie, dbp:producer, ?producer)]]D [[(?movie, dbp:producer, ?producer)]]D* Problem Definition Given an RDF data set D and a SPARQL query Q against D. Consider D* the virtual data set that contains all the data that should be in D. ! ! P1) Identifying portions of Q that yield missing values ! P2) Resolving missing values HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! ⊂ µ={movieàdbr:The_Wolf_of_Wall_Street_(2013)_film, produceràdbr:Leonardo_DiCaprio} [[(?movie, dbp:producer, ?producer)]]D ∧∉ µ={movieàdbr:The_Wolf_of_Wall_Street_(2013)_film, produceràdbr:Leonardo_DiCaprio} [[(?movie, dbp:producer, ?producer)]]D*∈ 7   Does not belong to DBpedia! Should belong to DBpedia!
  • 8. OUR APPROACH: HARE HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 8  
  • 9. HARE •  A hybrid machine/human SPARQL query engine that is able to enhance the size of query answers. ! •  Based on a novel RDF completeness model, HARE implements query optimization and execution techniques:! P1) Identifying portions of queries that yield missing values. •  HARE resorts to microtask crowdsourcing:! P2) Resolving missing values. ! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 9  
  • 10. HARE Architecture HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input! 10  
  • 11. HARE Architecture HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input! 11  
  • 12. RDF Completeness Model (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! dbr:! Eric_Fellner! dbr:! Tim_Bevan! dbr:! Kevin_Misher! dbp:producer!rdf:type! rdf:type! schema.org:! Movie! rdf:type! dbr:! Bad_Hair! ?! ?! dbp:producer! dbp:producer! Movies have producers (e.g. db:The_Interpreter).! dbr:! Tower_Heist! dbr:! The_Interpreter! …   12  
  • 13. RDF Completeness Model (2) ①  Predicate multiplicity of an RDF resource! Number of different objects that a resource has for a certain predicate.! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! MD(dbr:The_Interpreter | dbp:producer) = 3 dbr:! Eric_Fellner! dbr:! Tim_Bevan! dbr:! Kevin_Misher! dbp:producer! dbr:! The_Interpreter! 13  
  • 14. RDF Completeness Model (3) ②  Aggregated predicate multiplicity of a class! Given a predicate, median number of distinct objects that have all the resources that belong to a class. ! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! AMD(schema.org:Movies | dbp:producer) = 3 MD(dbr:The_Interpreter | dbp:producer) = 3 MD(dbr:Legal_Eagles | dbp:producer) = 2 14  
  • 15. RDF Completeness Model (4) ③  Completeness of an RDF resource (with respect to a predicate)! Given a predicate, the completeness of an RDF resource is determined by the aggregated predicate multiplicity of the classes that it belongs to.! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! CompD(dbr:The_Interpreter | dbp:producer) = CompD(dbr:Legal_Eagles | dbp:producer) = CompD(dbr:Bad_Hair) | dbp:producer) = 3 3 2 3 0 3 ①     Computed in ! Computed in !②      15  
  • 16. HARE Architecture HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input! 16  
  • 17. Crowd Knowledge •  The knowledge collected from the crowd is captured in three knowledge bases:! •  CKB+, CKB–, CKB~ are fuzzy sets over RDF data composed of 4-tuples of the form:! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! CKB = ( , , ) CKB+! CKB–! CKB~! (subject, predicate, object, membership_degree) RDF triple 17  
  • 18. Types of Crowd Knowledge Bases! Crowd Knowledge HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! (dbr:Bad_Hair, dbp:producer, _:o2, 0.78)! “Brian Grazer is a producer of Tower Heist.”! (dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)! “Tower Heist does not have a producer.”! (dbr:Tower_Heist, dbp:producer, _:o1, 0.05)! “I am not sure if Bad Hair has a producer.”! CKB+! CKB-! CKB~! 18  
  • 19. Types of Crowd Knowledge Bases! Crowd Knowledge HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! (dbr:Bad_Hair, dbp:producer, _:o2, 0.78)! “Brian Grazer is a producer of Tower Heist.”! (dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)! “Tower Heist does not have a producer.”! (dbr:Tower_Heist, dbp:producer, _:o1, 0.05)! “I am not sure if Bad Hair has a producer.”! CKB+! CKB-! CKB~! Contradiction" Uncertainty! 19  
  • 20. Measuring Contradiction! ! •  Contradiction occurs when triples with the same subject and predicate belong to CKB+ and CKB–.! •  It is measured as follows:! •  Contradiction values close to 0.0 indicate high consensus.! ! Contradiction(dbr:Tower_Heist | dbp:producer) = 1 - | 0.9 – 0.05 | ! = 0.15! Crowd Knowledge HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! (dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)! (dbr:Tower_Heist, dbp:producer, _:o1, 0.05)! CKB+! CKB–! 20  
  • 21. Measuring Uncertainty! ! •  When a triple belongs to CKB~, the value of the triple object is unknown or uncertain.! ! •  Uncertainty is measured as follows:! •  Uncertainty values close to 1.0 indicate that the crowd has shown to be unknowledgeable about the fact to be vetted.! ! Uncertainty(dbr:Bad_Hair| dbp:producer) = avg({0.78})! = 0.78! Crowd Knowledge HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! (dbr:Bad_Hair, dbp:producer, _:o2, 0.78)! CKB~! 21  
  • 22. HARE Architecture HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input! 22  
  • 23. Query Optimizer (1) •  Heuristic-based optimizer that decomposes the BGPs of a SPARQL query into two subsets:! –  SQD: triples patterns executed against the data set D," –  SQCROWD: triple patterns to be crowdsourced.! ! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 23  
  • 24. Query Optimizer (2) •  Given a SPARQL query Q:! –  Triple patterns in Q with variables in the subject position and object position are added to SQCROWD.! –  The rest of the triple patterns in Q are added to to SQD.! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SELECT  DISTINCT  ?movie  WHERE  {    ?movie  rdf:type  schema.org:Movie  .    ?movie  dbp:producer  ?producer  .    ?movie  dct:subject  dbc:Universal_Pictures_film  .    ?movie  dct:subject  dbxFilms_shot_in_New_York_City  .   }         t1   t2   t3   t4   SQCROWD   SQD   SQD   SQD   24  
  • 25. •  The optimizer builds a query plan TQ for query Q.! •  Triple patterns from SQD are grouped into star-shaped sub-queries in a bushy tree [Vidal et al.].! •  Triple patterns in SQCROWD are added to the plan TQ in a left-linear fashion.! ! ! Query Optimizer (3) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! t1   t3   t4   t2   SQD   SQCROWD   25  
  • 26. Query Engine (1) •  Executes the query plan TQ.! •  Sub-queries that are part of SQD are executed against the data set:! •  For each mapping contained in Ω, the engine instantiates the triple patterns in SQCROWD.! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! t1   t3   t4   SQD   Ω = {{movieà dbr:Tower_Heist}, {movieà dbr:Legal_Eagles}, …} 26  
  • 27. Query Engine (2) Example of an Iteration ! •  The engine processes {movieà dbr:Tower_Heist}. ! •  Following the running example:! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! Comp (dbr:Tower_Heist) | dbp:producer) = = 0.33 1 3 Contradiction (dbr:Tower_Heist) | dbp:producer) = 0.15 Uncertainty(dbr:Tower_Heist) | dbp:producer) = 0.0 27   (dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)! (dbr:Tower_Heist, dbp:producer, _:o1, 0.05)! CKB+! CKB–! (dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!CKB~!
  • 28. Query Engine (3) Example of an Iteration ! •  The algorithm computes the probability of crowdsourcing the triple pattern (dbr:Tower_Heist, dbp:producer, ?producer):! •  α is a score weight between 0.0 and 1.0 (in example 0.5)! •  If P(CROWD | μ(s), p) is greater than a user threshold τ, then algorithm crowdsources the triple pattern (μ(s), p, o).! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! P(CROWD | μ(s), p) = α (1 – 0.33) + (1 – α) min{0.15, 1 – 0.0} = 0.41 Estimated incompleteness Crowd reliability 28  
  • 29. •  The engine combines mappings obtained from the data set D and mappings from the crowd stored in CKB+.! •  The query evaluation terminates when all the sub- queries are executed. ! Query Engine (4) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! The HARE query engine does not increase the time complexity of executing a SPARQL query.! (Theorem 1) 29  
  • 30. HARE Architecture HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input! 30  
  • 31. Microtask Manager (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! • Receives triple patterns to crowdsource, for example:! • Creates human tasks.! ! • Submits tasks to the crowdsourcing platform.! (dbr:Tower_Heist, dbp:producer, ?p) 31  
  • 32. Microtask Manager (2) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! dbr:Tower_Heist, rdfs:label, dbp:producer, rdfs:label, dbr:Tower_Heist, foaf:depiction, dbr:Tower_Heist, dbo:abstract, dbr:Tower_Heis, foaf:primaryTopic, HARE exploits the semantics encoded in RDF resources! 32  
  • 33. Microtask Manager (3) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 33   CKB+! CKB-! CKB~!
  • 34. EXPERIMENTAL STUDY HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 34  
  • 35. •  Benchmark: 50 queries against (v. 2014).! –  Ten queries in different knowledge domains: ! History, Life Sciences, Movies, Music, and Sports.! •  Implementation details:! –  HARE is implemented in Python 2.7.6.! –  CrowdFlower is used as crowdsourcing platform.! •  Crowdsourcing configuration:! –  Four different RDF triples per task, 0.07 US$ per task.! –  At least three judgments were collected per task.! •  Total RDF triple patterns crowdsourced: 502! •  Total answers collected from the crowd: 1,609! Experimental Set-Up HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 35  
  • 36. Results: Size of Query Answer (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 0 5 10 15 20 25 30 35 40 45 Q1 Q2 Q5 Q6 Q3 Q4 Q10 Q8 Q9 Q7 #Answers Queries Crowd Answers Data Set Answers Sports! 0 10 20 30 40 50 60 70 80 Q4 Q2 Q3 Q1 Q5 Q4 Q7 Q8 Q9 Q10 #Answers Queries Crowd Answers Data Set Answers Music! Life Sciences! 0 20 40 60 80 100 120 140 160 180 Q2 Q4 Q1 Q3 Q5 Q8 Q7 Q9 Q6 Q10 #Answers Queries Crowd Answers Data Set Answers 1.25 – 2.00! 1.50 – 2.00! 1.08 – 1.92! HARE identifies sub-queries that produce incomplete answers. Crowdsourcing is a feasible solution to resolve missing values. ! 36   Metric: Number of answers when queries are executed.!
  • 37. Results: Size of Query Answer (2) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 0 100 200 300 400 500 Q1 Q2 Q3 Q5 Q6 Q4 Q7 Q8 Q10 Q9 #Answers Queries Crowd Answers Data Set Answers 0 20 40 60 80 100 120 140 160 Q8 Q3 Q7 Q6 Q5 Q4 Q1 Q2 Q9 Q10 #Answers Queries Crowd Answers Data Set Answers Movies! History! 1.05 – 3.13! 1.10 – 1.89! HARE identifies sub-queries that produce incomplete answers. Crowdsourcing is a feasible solution to resolve missing values. ! 37   Metric: Number of answers when queries are executed.!
  • 38. Metric: Elapsed time since the first task until the last answer is retrieved.! Results: Crowd Response Time (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90100 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Judgmentscompleted(%)! 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 Time (min) Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Sports! Music! Life Sciences! (12th min.): 77%! Time (min)Time (min) (12th min.): 82%! (12th min.): 97%! At the 12th minute after the first task is submitted the crowd produces at least 75% of the answers.! 38  
  • 39. 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Results: Crowd Response Time (2) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! Judgmentscompleted(%)! Movies! History! (12th min.): 98%! Time (min) (12th min.): 75%! 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Time (min) At the 12th minute after the first task is submitted the crowd produces at least 75% of the answers.! 39   Metric: Elapsed time since the first task until the last answer is retrieved.!
  • 40. Metric: A true positive is a mapping that belongs to the query answer.! Sports Music Life Sciences Movies History Q1 1.00 1.00 0.67 0.88 1.00 Q2 1.00 1.00 1.00 0.96 1.00 Q3 1.00 1.00 0.89 0.79 0.67 Q4 0.55 0.67 1.00 1.00 0.96 Q5 0.86 0.67 1.00 1.00 0.95 Q6 0.69 0.83 1.00 1.00 0.96 Q7 1.00 0.63 0.71 1.00 0.57 Q8 1.00 0.67 0.88 0.94 0.72 Q9 0.46 0.73 1.00 1.00 0.64 Q10 0.92 0.49 1.00 1.00 0.95 Avg 0.85 0.77 0.91 0.96 0.84 Results: Quality of Crowd Answers HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! Sports Music Life Sciences Movies History Q1 1.00 1.00 1.00 0.47 1.00 Q2 1.00 0.29 1.00 1.00 1.00 Q3 1.00 1.00 1.00 1.00 1.00 Q4 0.83 1.00 1.00 1.00 1.00 Q5 1.00 0.86 1.00 1.00 1.00 Q6 1.00 1.00 1.00 1.00 0.96 Q7 1.00 1.00 1.00 1.00 0.84 Q8 1.00 1.00 1.00 1.00 0.78 Q9 1.00 1.00 1.00 1.00 0.92 Q10 1.00 1.00 1.00 1.00 0.98 Avg 0.98 0.91 1.00 0.95 0.95 Recall! Precision! The crowd exhibits heterogeneous performance within domains. This supports the importance of HARE triple-based approach.! 40  
  • 41. RELATED WORK HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 41  
  • 42. Human/computer query processing architectures! Summary of Related Work HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! Manual specification Automatically HARE CrowdDB [Franklin et al.]: Tables, columns Deco [Park and Widom]: Rules Qurk [Marcus et al.]: Microtask I/O HARE relies on the RDF graph and crowd knowledge to resort to crowdsourcing ! Crowdsourcing 42  
  • 43. Crowdsourcing in other contexts of Data Management (SPARQL- or RDF-based) Summary of Related Work HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! HARE OASSIS [Amsterdamer et al.] KATARA [Chu et al.] SPARQL Query Processing Tabular Data Cleansing Recommendation System Mines crowdsourced patterns specified in a SPARQL-like language Compares tabular data against RDF data sets via crowdsourced mappings Resorts to crowdsourcing to complete missing values in RDF data sets 43  
  • 44. CONCLUSIONS & FUTURE WORK HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 44  
  • 45. Conclusions •  HARE: Hybrid query engine against RDF data sets.! •  Supports microtasks to enhance query answers on-the-fly.! ! ! •  Experimental results confirmed that:! ! ! Future work •  Study further approaches to capture crowd reliability.! •  Consider other quality dimensions on the knowledge collected from the crowd.! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 3.13 times! Size of query answer! Crowd response time! (12th min.): 98%! Accuracy! 0.84 – 0.96! 45  
  • 46. References •  [Amsterdamer et al.] Y. Amsterdamer, S. B. Davidson, T. Milo, S. Novgorodov, and A. Somech. OASSIS: query driven crowd mining. In SIGMOD, pages 589–600, 2014. ! •  [Chu et al.] X. Chu, J. Morcos, I. F. Ilyas, M. Ouzzani, P. Papotti, N. Tang, and Y. Ye. Katara: A data cleaning system powered by knowledge bases and crowdsourcing. In SIGMOD, pages 1247–1261, 2015. ! •  [Marcus et al.] A. Marcus, D. R. Karger, S. Madden, R. Miller, and S. Oh. Counting with the crowd. PVLDB, 6(2):109–120, 2012. ! •  [Park and Widom] H. Park and J.Widom. Query optimization over crowdsourced data. PVLDB, 6(10):781–792, 2013. ! •  [Vidal et al.] M.E. Vidal, E. Ruckhaus, T. Lampo, A. Martínez, J. Sierra, and A. Polleres. Efficiently joining group patterns in SPARQL queries. In ESWC, pages 228–242, 2010. ! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 46  
  • 47. HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input!