SlideShare a Scribd company logo
Authors:
Ilias Tachmazidis, Grigoris Antoniou,
      Giorgos Flouris, Spyros Kotoulas
                  Partially funded by PlanetData
   Huge data set coming from
    ◦ the Web, government authorities, scientific
      databases, sensors and more
   Defeasible logic
    ◦ is suitable for encoding commonsense knowledge
      and reasoning
    ◦ avoids triviality of inference due to low-quality data
   Defeasible logic has low complexity
    ◦ The consequences of a defeasible theory D can be
      computed in O(N) time, where N is the number of
      symbols in D
   Reasoning is performed in the presence of
    defeasible rules
   Defeasible logic has been implemented for
    in-memory reasoning, however, it is not
    applicable for huge data set
   Solution: scalability/parallelization using the
    MapReduce framework
   Our approach is restricted to single-
    argument reasoning
   A multi-argument implementation has been
    accepted in ECAI 2012 (to appear)
   Ilias Tachmazidis, Grigoris Antoniou, Giorgos
    Flouris, Spyros Kotoulas, and Lee
    McCluskey, ‘Large-scale Parallel Stratified
    Defeasible Reasoning’, in ECAI, (2012)
   Facts
    ◦ e.g. bird(eagle)
   Strict Rules
    ◦ e.g. bird(X)  animal(X)
   Defeasible Rules
    ◦ e.g. bird(X)  flies(X)
•   Priority Relation (acyclic relation on the set of
    rules)
    –e.g.    r: bird(X)  flies(X)
             r’: brokenWing(X)  ¬ flies(X)
             r’ > r
   Inspired by similar primitives in LISP and
    other functional languages
   Operates exclusively on <key, value> pairs
   Input and Output types of a MapReduce job:
    ◦   Input : <k1, v1>
    ◦   Map(k1,v1) → list(k2,v2)
    ◦   Reduce(k2, list (v2)) → list(k3,v3)
    ◦   Output : list(k3,v3)
   Provides an infrastructure that takes care of
    ◦ distribution of data
    ◦ management of fault tolerance
    ◦ results collection
   For a specific problem
    ◦ developer writes a few routines which are following
      the general interface
   Rule decomposition
    ◦ the computation of each rule is assigned to a
      computer in the cloud
    ◦ difficult to achieve balanced work distribution
   Data decomposition
    ◦ a subset of data is assigned to each computer in
      the cloud
    ◦ provides more fine-grained partitioning
    ◦ our solution is based on data decomposition
   Rule set:
    ◦   r1   : bird(X)  animal(X)
    ◦   r2   : bird(X)  flies(X)
    ◦   r3   : brokenWing(X)  ¬ flies(X)
    ◦   r3   > r2
   Consider bird(eagle) and brokenWing(owl) as
    facts
   Note that flies(eagle) and ¬ flies(owl) are not
    conflicting with each other!
   Reasoning is performed, in isolation, for each
    unique argument value
INPUT                  MAP phase Input
Facts in multiple files      <position in file, fact>

         File01
   --------------------       <0, bird(eagle)>
     bird(eagle)
                               <11, bird(owl)>
      bird(owl)
                             <0, bird(pigeon)>
        File02
                          <12, brokenWing(eagle)>
   -------------------
   bird(pigeon)           <29, brokenWing(owl)>
brokenWing(eagle)
 brokenWing(owl)
MAP phase Output                              Reduce phase Input




                       Grouping/Sorting
<argument,predicate>                       <argument, list(predicates)>

    <eagle, bird>
                                          <eagle, <bird, brokenWing>>
     <owl, bird>
                                          <owl, <bird, brokenWing>>
   <pigeon, bird>

<eagle, brokenWing>                            <pigeon, <bird>>

<owl, brokenWing>
Reduce phase Output
       (Final Output)
<Conclusions after reasoning>

      animal(eagle)
      ¬ flies(eagle)

       animal(owl)
       ¬ flies(owl)

      animal(pigeon)
       flies(pigeon)
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
   This work is the first to explore the feasibility
    of nonmonotonic reasoning over huge data
    sets
   We considered nonmonotonic reasoning in
    the form of defeasible logic and adapted the
    MapReduce framework for parallelization
   Our experimental results demonstrate that
    ◦ defeasible reasoning with billions of data is
      performant
    ◦ our approach has the potential to scale to trillions
      of facts.

More Related Content

PDF
Rsplit apply combine
PPTX
Advanced data structures slide 2 2+
PDF
Data manipulation with dplyr
PDF
Grouping & Summarizing Data in R
PDF
Data Manipulation Using R (& dplyr)
PDF
Time complexity of union find
PPT
Chapter 4: basic search algorithms data structure
PDF
Next Generation Programming in R
Rsplit apply combine
Advanced data structures slide 2 2+
Data manipulation with dplyr
Grouping & Summarizing Data in R
Data Manipulation Using R (& dplyr)
Time complexity of union find
Chapter 4: basic search algorithms data structure
Next Generation Programming in R

What's hot (20)

PDF
Data structures "1" (Lectures 2015-2016)
PPTX
Set Operations - Union Find and Bloom Filters
PPT
Chapter 5: linked list data structure
PDF
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
PPT
Model-based GUI testing using UPPAAL
PDF
Jan vitek distributedrandomforest_5-2-2013
PDF
4 R Tutorial DPLYR Apply Function
PDF
HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Fi...
PDF
Matching Dirty Data
PDF
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
PPT
Data structures cs301 power point slides lecture 01
PDF
Gradient boosting in practice: a deep dive into xgboost
PDF
Machine Learning in R
PPT
Data Structure Lec #1
PDF
Preparation Data Structures 03 abstract data_types
PDF
Introduce spark (by 조창원)
PDF
[系列活動] Data exploration with modern R
PDF
Morel, a Functional Query Language
PPT
Memory Management In C++
PDF
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Data structures "1" (Lectures 2015-2016)
Set Operations - Union Find and Bloom Filters
Chapter 5: linked list data structure
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
Model-based GUI testing using UPPAAL
Jan vitek distributedrandomforest_5-2-2013
4 R Tutorial DPLYR Apply Function
HDF5 Advanced Topics: Selections, Object's Properties, Storage Methods and Fi...
Matching Dirty Data
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Data structures cs301 power point slides lecture 01
Gradient boosting in practice: a deep dive into xgboost
Machine Learning in R
Data Structure Lec #1
Preparation Data Structures 03 abstract data_types
Introduce spark (by 조창원)
[系列活動] Data exploration with modern R
Morel, a Functional Query Language
Memory Management In C++
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Ad

Similar to Towards Parallel Nonmonotonic Reasoning with Billions of Facts (20)

PDF
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
PDF
Cloudera - A Taste of random decision forests
PDF
Introducción a hadoop
PDF
Pune Clojure Course Outline
PDF
Survey onhpcs languages
PPT
Hands on data science with r.pptx
PDF
ASFWS 2012 - Obfuscator, ou comment durcir un code source ou un binaire contr...
PDF
Crash course to learn python programming
PPTX
Big Data Scala by the Bay: Interactive Spark in your Browser
PDF
Reactive programming on Android
PDF
User biglm
PPT
Leveraging Hadoop in your PostgreSQL Environment
ODP
Introduction to R
PDF
Cloud jpl
PDF
Extending lifespan with Hadoop and R
PDF
R Workshop for Beginners
PDF
Introduction To Erlang Final
PPT
Stacksqueueslists
PPT
Stacks queues lists
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Cloudera - A Taste of random decision forests
Introducción a hadoop
Pune Clojure Course Outline
Survey onhpcs languages
Hands on data science with r.pptx
ASFWS 2012 - Obfuscator, ou comment durcir un code source ou un binaire contr...
Crash course to learn python programming
Big Data Scala by the Bay: Interactive Spark in your Browser
Reactive programming on Android
User biglm
Leveraging Hadoop in your PostgreSQL Environment
Introduction to R
Cloud jpl
Extending lifespan with Hadoop and R
R Workshop for Beginners
Introduction To Erlang Final
Stacksqueueslists
Stacks queues lists
Ad

More from PlanetData Network of Excellence (20)

PDF
A Contextualized Knowledge Repository for Open Data about Trentino
PDF
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
PDF
Towards Enabling Probabilistic Databases for Participatory Sensing
PDF
Privacy-Preserving Schema Reuse
PDF
Pay-as-you-go Reconciliation in Schema Matching Networks
PPTX
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
PPT
On the need for a W3C community group on RDF Stream Processing
PDF
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
PDF
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
PDF
SciQL, Bridging the Gap between Science and Relational DBMS
PPT
CLODA: A Crowdsourced Linked Open Data Architecture
PPT
Data and Knowledge Evolution
PPS
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
PPS
Access Control for RDF graphs using Abstract Models
PDF
Arrays in Databases, the next frontier?
PPS
Abstract Access Control Model for Dynamic RDF Datasets
PDF
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
PDF
Heuristic based Query Optimisation for SPARQL
PDF
Adaptive Semantic Data Management Techniques for Federations of Endpoints
A Contextualized Knowledge Repository for Open Data about Trentino
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
Towards Enabling Probabilistic Databases for Participatory Sensing
Privacy-Preserving Schema Reuse
Pay-as-you-go Reconciliation in Schema Matching Networks
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
On the need for a W3C community group on RDF Stream Processing
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
SciQL, Bridging the Gap between Science and Relational DBMS
CLODA: A Crowdsourced Linked Open Data Architecture
Data and Knowledge Evolution
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Access Control for RDF graphs using Abstract Models
Arrays in Databases, the next frontier?
Abstract Access Control Model for Dynamic RDF Datasets
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Heuristic based Query Optimisation for SPARQL
Adaptive Semantic Data Management Techniques for Federations of Endpoints

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Electronic commerce courselecture one. Pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Encapsulation theory and applications.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Review of recent advances in non-invasive hemoglobin estimation
Dropbox Q2 2025 Financial Results & Investor Presentation
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Assigned Numbers - 2025 - Bluetooth® Document
sap open course for s4hana steps from ECC to s4
Digital-Transformation-Roadmap-for-Companies.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
A comparative analysis of optical character recognition models for extracting...
Diabetes mellitus diagnosis method based random forest with bat algorithm
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Electronic commerce courselecture one. Pdf
Spectroscopy.pptx food analysis technology
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation_ Review paper, used for researhc scholars
Encapsulation theory and applications.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Review of recent advances in non-invasive hemoglobin estimation

Towards Parallel Nonmonotonic Reasoning with Billions of Facts

  • 1. Authors: Ilias Tachmazidis, Grigoris Antoniou, Giorgos Flouris, Spyros Kotoulas Partially funded by PlanetData
  • 2. Huge data set coming from ◦ the Web, government authorities, scientific databases, sensors and more  Defeasible logic ◦ is suitable for encoding commonsense knowledge and reasoning ◦ avoids triviality of inference due to low-quality data  Defeasible logic has low complexity ◦ The consequences of a defeasible theory D can be computed in O(N) time, where N is the number of symbols in D
  • 3. Reasoning is performed in the presence of defeasible rules  Defeasible logic has been implemented for in-memory reasoning, however, it is not applicable for huge data set  Solution: scalability/parallelization using the MapReduce framework  Our approach is restricted to single- argument reasoning
  • 4. A multi-argument implementation has been accepted in ECAI 2012 (to appear)  Ilias Tachmazidis, Grigoris Antoniou, Giorgos Flouris, Spyros Kotoulas, and Lee McCluskey, ‘Large-scale Parallel Stratified Defeasible Reasoning’, in ECAI, (2012)
  • 5. Facts ◦ e.g. bird(eagle)  Strict Rules ◦ e.g. bird(X)  animal(X)  Defeasible Rules ◦ e.g. bird(X)  flies(X)
  • 6. Priority Relation (acyclic relation on the set of rules) –e.g. r: bird(X)  flies(X) r’: brokenWing(X)  ¬ flies(X) r’ > r
  • 7. Inspired by similar primitives in LISP and other functional languages  Operates exclusively on <key, value> pairs  Input and Output types of a MapReduce job: ◦ Input : <k1, v1> ◦ Map(k1,v1) → list(k2,v2) ◦ Reduce(k2, list (v2)) → list(k3,v3) ◦ Output : list(k3,v3)
  • 8. Provides an infrastructure that takes care of ◦ distribution of data ◦ management of fault tolerance ◦ results collection  For a specific problem ◦ developer writes a few routines which are following the general interface
  • 9. Rule decomposition ◦ the computation of each rule is assigned to a computer in the cloud ◦ difficult to achieve balanced work distribution  Data decomposition ◦ a subset of data is assigned to each computer in the cloud ◦ provides more fine-grained partitioning ◦ our solution is based on data decomposition
  • 10. Rule set: ◦ r1 : bird(X)  animal(X) ◦ r2 : bird(X)  flies(X) ◦ r3 : brokenWing(X)  ¬ flies(X) ◦ r3 > r2  Consider bird(eagle) and brokenWing(owl) as facts  Note that flies(eagle) and ¬ flies(owl) are not conflicting with each other!  Reasoning is performed, in isolation, for each unique argument value
  • 11. INPUT MAP phase Input Facts in multiple files <position in file, fact> File01 -------------------- <0, bird(eagle)> bird(eagle) <11, bird(owl)> bird(owl) <0, bird(pigeon)> File02 <12, brokenWing(eagle)> ------------------- bird(pigeon) <29, brokenWing(owl)> brokenWing(eagle) brokenWing(owl)
  • 12. MAP phase Output Reduce phase Input Grouping/Sorting <argument,predicate> <argument, list(predicates)> <eagle, bird> <eagle, <bird, brokenWing>> <owl, bird> <owl, <bird, brokenWing>> <pigeon, bird> <eagle, brokenWing> <pigeon, <bird>> <owl, brokenWing>
  • 13. Reduce phase Output (Final Output) <Conclusions after reasoning> animal(eagle) ¬ flies(eagle) animal(owl) ¬ flies(owl) animal(pigeon) flies(pigeon)
  • 16. This work is the first to explore the feasibility of nonmonotonic reasoning over huge data sets  We considered nonmonotonic reasoning in the form of defeasible logic and adapted the MapReduce framework for parallelization  Our experimental results demonstrate that ◦ defeasible reasoning with billions of data is performant ◦ our approach has the potential to scale to trillions of facts.