SlideShare a Scribd company logo
September 2017
Source: LOD-Cloud (http://lod-cloud.net/ )
◎
→
◎
•
•
•
Disk
In-memory
Iteration 1 Iteration 2 Iteration n
Intermediate
Dataset
(in cluster
memory)
Intermediate
Dataset
(in cluster
memory)
Output
•
•
•
•
•
•
•
SANSA ISWC 2017 Talk
“Big Data” Processing (Spark/Flink) Semantic Technology Stack
Data Integration Manual pre-processing Partially automated,
standardised
Modelling Simple (often flat feature vectors) Expressive
Support for data
exchange
Limited (heterogeneous formats
with limited schema information)
Yes (RDF & OWL W3C
Standards)
Business value Direct Indirect
Horizontally
scalable
Yes No
Idea: combine advantages of both worlds
SANSA ISWC 2017 Talk
SANSA ISWC 2017 Talk
•
•
•
val graph: TripleRDD = NTripleReader.load(spark, uri)
graph.find(ANY, URI("http://dbpedia.org/ontology/influenced"), ANY)
val rdf_stats_prop_dist = PropertyUsage(graph, spark).PostProc()
•
•
•
•
•
•
val rdd = ManchesterSyntaxOWLAxiomsRDDBuilder.build(spark, "file.owl")
// get all subclass-of axioms
val sco = rdd.filter(_.isInstanceOf[OWLSubClassOfAxiom])
SANSA ISWC 2017 Talk
val graphRdd = NTripleReader.load(spark,input)
val partitions = RdfPartitionUtilsSpark.partitionGraph(graphRdd)
val rewriter = SparqlifyUtils.createSparqlSqlRewriter(spark, partitions)
val qef = new QueryExecutionFactorySparqlifySpark(spark, rewriter)
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
Query Layer
Sparqlifying
Distributed Data
Structures
ResultsViews Views
SANSA ISWC 2017 Talk
•
•
•
→
→
val graph = RDFGraphLoader.loadFromDisk(spark, uri)
val reasoner = new ForwardRuleReasonerOWLHorst(spark.sparkContext)
val inferredGraph = reasoner.apply(graph)
RDFGraphWriter.writeToDisk(inferredGraph, output)
RDFS rule
dependency graph
(simplified)
SANSA ISWC 2017 Talk
•
•
•
•
•
•
•
•
•
•
•
•
Visit our demo at 6pm!
•
•
•
•
•
•
•
•
•
•
•
•
•
Web: http://sansa-stack.net
Twitter: @SANSA_Stack
Github: https://github.com/SANSA-Stack
Mail: sansa-stack@googlemail.com
•
•
•
SANSA ISWC 2017 Talk
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

More Related Content

PPT
JavaOne_2010
PPTX
Cassandra Lunch #59 Functions in Cassandra
PDF
Open stack @ iiit hyderabad
PPTX
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
PDF
Spark: Taming Big Data
PPTX
Geo data analytics
PDF
Road to Analytics
PPTX
Getting started with SparkSQL - Desert Code Camp 2016
JavaOne_2010
Cassandra Lunch #59 Functions in Cassandra
Open stack @ iiit hyderabad
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Spark: Taming Big Data
Geo data analytics
Road to Analytics
Getting started with SparkSQL - Desert Code Camp 2016

What's hot (20)

PDF
shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014
PDF
Nosql databases for the .net developer
PDF
Unsupervised Learning with Apache Spark
PDF
Using PostgreSQL with Bibliographic Data
PDF
Cassandra advanced data modeling
PPTX
U-SQL Reading & Writing Files (SQLBits 2016)
PDF
Scalding @ Coursera
PPTX
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
PDF
Build an Open Source Data Lake For Data Scientists
KEY
Mongodb lab
PDF
Building an API layer for C* at Coursera
PDF
Iceberg: A modern table format for big data (Strata NY 2018)
PDF
Evolution of Spark APIs
PPTX
Elasticsearch Arcihtecture & What's New in Version 5
PPTX
GraphDb in XPages
PPT
Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
PPTX
ELK - Stack - Munich .net UG
PDF
Small intro to Big Data - Old version
PDF
TypeSafe NoSQL @ TopConf 2012
ODP
Neo4j Spatial at LocationDay 2013 in Malmö
shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014
Nosql databases for the .net developer
Unsupervised Learning with Apache Spark
Using PostgreSQL with Bibliographic Data
Cassandra advanced data modeling
U-SQL Reading & Writing Files (SQLBits 2016)
Scalding @ Coursera
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
Build an Open Source Data Lake For Data Scientists
Mongodb lab
Building an API layer for C* at Coursera
Iceberg: A modern table format for big data (Strata NY 2018)
Evolution of Spark APIs
Elasticsearch Arcihtecture & What's New in Version 5
GraphDb in XPages
Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
ELK - Stack - Munich .net UG
Small intro to Big Data - Old version
TypeSafe NoSQL @ TopConf 2012
Neo4j Spatial at LocationDay 2013 in Malmö
Ad

Similar to SANSA ISWC 2017 Talk (20)

PDF
Spark Summit East 2015 Advanced Devops Student Slides
PDF
Simplifying Big Data Analytics with Apache Spark
PDF
Unified Big Data Processing with Apache Spark (QCON 2014)
PPTX
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
PDF
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
PDF
An Overview of Apache Spark
PPTX
Large-Scale Data Science in Apache Spark 2.0
PDF
Unified Big Data Processing with Apache Spark
PDF
Composable Parallel Processing in Apache Spark and Weld
PPTX
An Introduct to Spark - Atlanta Spark Meetup
PPTX
An Introduction to Spark
PDF
NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
PPTX
Intro to Spark
PDF
Bds session 13 14
PDF
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
PPTX
APACHE SPARK.pptx
PDF
Introduction to Apache Spark
PDF
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
PDF
Keeping the fun in functional w/ Apache Spark @ Scala Days NYC
PDF
Apache spark - Architecture , Overview & libraries
Spark Summit East 2015 Advanced Devops Student Slides
Simplifying Big Data Analytics with Apache Spark
Unified Big Data Processing with Apache Spark (QCON 2014)
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
An Overview of Apache Spark
Large-Scale Data Science in Apache Spark 2.0
Unified Big Data Processing with Apache Spark
Composable Parallel Processing in Apache Spark and Weld
An Introduct to Spark - Atlanta Spark Meetup
An Introduction to Spark
NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin
Intro to Spark
Bds session 13 14
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
APACHE SPARK.pptx
Introduction to Apache Spark
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
Keeping the fun in functional w/ Apache Spark @ Scala Days NYC
Apache spark - Architecture , Overview & libraries
Ad

Recently uploaded (20)

PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
famous lake in india and its disturibution and importance
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPTX
2. Earth - The Living Planet earth and life
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPT
6.1 High Risk New Born. Padetric health ppt
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
BIOMOLECULES PPT........................
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Introduction to Cardiovascular system_structure and functions-1
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
Taita Taveta Laboratory Technician Workshop Presentation.pptx
lecture 2026 of Sjogren's syndrome l .pdf
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
famous lake in india and its disturibution and importance
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
2. Earth - The Living Planet earth and life
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
6.1 High Risk New Born. Padetric health ppt
POSITIONING IN OPERATION THEATRE ROOM.ppt
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Introduction to Fisheries Biotechnology_Lesson 1.pptx
INTRODUCTION TO EVS | Concept of sustainability
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
BIOMOLECULES PPT........................
Biophysics 2.pdffffffffffffffffffffffffff
Introduction to Cardiovascular system_structure and functions-1

SANSA ISWC 2017 Talk