SlideShare a Scribd company logo
rd
3 Status report of degree project

           Integrating SWI-Prolog
      for semantic reasoning in Bioclipse
            Samuel Lampa, 2010-04-07
       Project blog: http://saml.rilspace.com
Research question



How do biochemical questions
 formulated as Prolog queries
  compare to other solutions
available in Bioclipse in terms of
  speed and expressiveness?
Compared Semantic Tools

●   Jena
    ●   General RDF querying (via SPARQL)
● Pellet
  ● OWL-DL Reasoning (via SPARQL)


  ● General querying via Jena (via SPARQL)


● SWI-Prolog


    ●   Access to RDF triples (both assertion and querying) via the
        rdf( Subject, Predicate, Object ) method
    ●   Complex wrapper/convenience methods can be built
Use Case: NMRShiftDB



Interesting use case:
Querying NMRShiftDB data
 ● Characteristics:


   – Rather shallow RDF graph
   – Numeric (float value) interval
     matching
NMR Spectrum Similarity Search

                        What to test:
                        Given a spectrum,
                        represented as a list of shift
                        values, find spectra with
                        the same shifts, (allowing
            Intensity   variation within a limit).


         Shift          → “Dereferencing”
                        spectra
Example Data

<http://pele.farmbio.uu.se/nmrshiftdb/?moleculeId=234>
     :hasSpectrum <http://pele.farmbio.uu.se/nmrshiftdb/?
spectrumId=4735>;
    :moleculeId "234".
<http://pele.farmbio.uu.se/nmrshiftdb/?spectrumId=4735>
    :hasPeak <http://pele.farmbio.uu.se/nmrshiftdb/?s4735p0>,
             <http://pele.farmbio.uu.se/nmrshiftdb/?s4735p1>,
             <http://pele.farmbio.uu.se/nmrshiftdb/?s4735p2>,
<http://pele.farmbio.uu.se/nmrshiftdb/?s4735p0>
    :hasShift "17.6"^^xsd:decimal .
<http://pele.farmbio.uu.se/nmrshiftdb/?s4735p1>
    :hasShift "18.3"^^xsd:decimal .
<http://pele.farmbio.uu.se/nmrshiftdb/?s4735p2>
    :hasShift "22.6"^^xsd:decimal .
% Register RDF namespaces, for use in the convenience methods at the end
              :- rdf_register_ns(nmr, 'http://www.nmrshiftdb.org/onto#').


Prolog code   :- rdf_register_ns(xsd, 'http://www.w3.org/2001/XMLSchema#').

              find_mol_with_peak_vals_near( SearchShiftVals, Mols ) :-
                % Pick the Mols in 'Mol', that match the pattern:
                %   list_peak_shifts_of_mol( Mol, MolShiftVals ), contains_list_elems_near( SearchShiftVals, MolShiftVals )
                % and collect them in 'Mols'.
                setof( Mol,
                       ( list_peak_shifts_of_mol( Mol, MolShiftVals ),        % A Mol's shift values are collected
                 contains_list_elems_near( SearchShiftVals, MolShiftVals ) ), % and compared against the given SearchShiftVals
                 [Mols|MolTail] ).                                            % In 'Mols', all 'Mol's, for which their shift
                                                                              % values match the SearchShiftVals, are collected.
              % Given a 'Mol', give it's shiftvalues in list form, in 'ListOfPeaks'
              list_peak_shifts_of_mol( Mol, ListOfPeaks ) :-
                has_spectrum( Mol, Spectrum ),
                findall( ShiftVal,
                         ( has_peak( Spectrum, Peak ),
                           has_shift_val( Peak, ShiftVal ) ),
                           ListOfPeaks ).
              % Compare two lists to see if list2 has near-matches for each of the values in list1
              contains_list_elems_near( [ElemHead|ElemTail], List ) :-
                member_close_to( ElemHead, List ),
                ( contains_list_elems_near( ElemTail, List );
                  ElemTail == [] ).


              %%%%%%%%%%%%%%%%%%%%%%%%
              % Recursive construct: %
              %%%%%%%%%%%%%%%%%%%%%%%%
              % Test first the end criterion:
              member_close_to( X, [ Y | Tail ] ) :-
                closeTo( X, Y ).
              % but if the above doesn't validate, then recursively continue with the tail of List2:
              member_close_to( X, [ Y | Tail ] ) :-
                member_close_to( X, Tail ).
              % Numerical near-match
              closeTo( Val1, Val2 ) :-
                abs(Val1 - Val2) =< 0.3.


              %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
              % Convenience accessory methods %
              %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
              has_shift_val( Peak, ShiftVal ) :-
                rdf( Peak, nmr:hasShift, literal(type(xsd:decimal, ShiftValLiteral))),
                atom_number_create( ShiftValLiteral, ShiftVal ).
              has_spectrum( Subject, Predicate ) :-
                rdf( Subject, nmr:has_spectrum, Predicate).
              has_peak( Subject, Predicate ) :-
                rdf( Subject, nmr:has_peak, Predicate).

              % Wrapper method for the atom_number/2 method which converts atoms (string constants) to number.
              % The wrapper methods avoids exceptions on empty atoms, instead converting into a zero.
              atom_number_create( Atom, Number ) :-
                atom_length( Atom, AtomLength ), AtomLength > 0 -> % IF atom is not empty
                atom_number( Atom, Number );                       % THEN Convert the atom to a numerical value
                atom_number( '0', Number ).                        % ELSE Convert to a zero ");
PREFIX owl: <http://www.w3.org/2002/07/owl#>


SPARQL code
              PREFIX afn: <http://jena.hpl.hp.com/ARQ/function#>
              PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
              PREFIX nmr: <http://www.nmrshiftdb.org/onto#>
              PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
              PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
              SELECT    ?s
              WHERE {
               ?s nmr:hasPeak [ nmr:hasShift ?s1 ] ,
                              [ nmr:hasShift ?s2 ] ,
                              [ nmr:hasShift ?s3 ] ,
                              [ nmr:hasShift ?s4 ] ,
                              [ nmr:hasShift ?s5 ] ,
                              [ nmr:hasShift ?s6 ] ,
                              [ nmr:hasShift ?s7 ] ,
                              [ nmr:hasShift ?s8 ] ,
                              [ nmr:hasShift ?s9 ] ,
                              [ nmr:hasShift ?s10 ] ,
                              [ nmr:hasShift ?s11 ] ,
                              [ nmr:hasShift ?s12 ] ,
                              [ nmr:hasShift ?s13 ] ,
                              [ nmr:hasShift ?s14 ] ,
                              [ nmr:hasShift ?s15 ] ,
                              [ nmr:hasShift ?s16 ] .
              FILTER ( fn:abs(?s1 - 17.6) < 0.3 ) .
              FILTER ( fn:abs(?s2 - 18.3) < 0.3 ) .
              FILTER ( fn:abs(?s3 - 22.6) < 0.3 ) .
              FILTER ( fn:abs(?s4 - 26.5) < 0.3 ) .
              FILTER ( fn:abs(?s5 - 31.7) < 0.3 ) .
              FILTER ( fn:abs(?s6 - 33.5) < 0.3 ) .
              FILTER ( fn:abs(?s7 - 33.5) < 0.3 ) .
              FILTER ( fn:abs(?s8 - 41.8) < 0.3 ) .
              FILTER ( fn:abs(?s9 - 42.0) < 0.3 ) .
              FILTER ( fn:abs(?s10 - 42.2) < 0.3 ) .
              FILTER ( fn:abs(?s11 - 78.34) < 0.3 ) .
              FILTER ( fn:abs(?s12 - 140.99) < 0.3 ) .
              FILTER ( fn:abs(?s13 - 158.3) < 0.3 ) .
              FILTER ( fn:abs(?s14 - 193.4) < 0.3 ) .
              FILTER ( fn:abs(?s15 - 203.0) < 0.3 ) .
              FILTER ( fn:abs(?s16 - 0) < 0.3 ) . }
“Expressiveness”
“Expressivity”: SPARQL vs Prolog




   SPARQL            PROLOG
Prolog predicate taking variables

  How to change “input parameters”?
● SPARQL: Modify SPARQL query


● Prolog: Change input parameter
Observations

● SPARQL
  ● Fewer lines of code


  ● Easier to understand the code


● Prolog


  ● Easier to change input parameters


  ● Easier to re-use existing logic

    (call a method rather than cut and paste
    SPARQL code)
  ● Easier to change aspects of the execution logic
Performance
Prolog vs Jena vs JenaTDB vs Pellet
Prolog vs Jena vs JenaTDB
Observations

● Prolog is the fastest (in-memory only)
● Jena faster with disk based than with

  in-memory RDF store!
● Pellet with in-memory store is slow


● Pellet with disk based store out of

  question
Project plan from last




Planned final presentation: 28 april 2010 (BMC B7:101a)
                 Everybody is welcome!
Thank you!
Project blog: http://saml.rilspace.com

More Related Content

PDF
Session 40 : SAGA Overview and Introduction
PDF
Machinelearning Spark Hadoop User Group Munich Meetup 2016
PDF
West-Nile-Virus | Kaggle
PDF
Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016
PPTX
[QE 2017] Grzegorz Galezowski - Prostota nie jest łatwa
PDF
Batch import of large RDF datasets into Semantic MediaWiki
PDF
Thesis presentation Samuel Lampa
PDF
Vagrant + Ansible + Docker
Session 40 : SAGA Overview and Introduction
Machinelearning Spark Hadoop User Group Munich Meetup 2016
West-Nile-Virus | Kaggle
Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016
[QE 2017] Grzegorz Galezowski - Prostota nie jest łatwa
Batch import of large RDF datasets into Semantic MediaWiki
Thesis presentation Samuel Lampa
Vagrant + Ansible + Docker

Similar to 3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse (20)

PDF
Presentation
PDF
Evaluation in Information Retrieval
PDF
PromQL Deep Dive - The Prometheus Query Language
PPT
Bioinformatica 10-11-2011-t5-database searching
PPTX
Bioinformatica t5-database searching
PDF
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
PDF
Hadoop Summit 2010 Machine Learning Using Hadoop
PPT
PPTX
Bioinformatics t5-databasesearching v2014
PDF
purrr.pdf
PDF
Gwt presen alsip-20111201
PDF
Scalable and Adaptive Graph Querying with MapReduce
PDF
Approximation Data Structures for Streaming Applications
PPTX
Frequent Itemset Mining on BigData
PPTX
Mining of massive datasets
PPTX
Data transformation and query management in personal health sensor network
PDF
Approaches to online quantile estimation
PDF
Validating the ChemSpider Open Spectral Database NMR Collection using ACD/Lab...
PDF
2015 11-17-programming inr.key
PPT
An Effective Rule Miner for Instance Matching in a Web of Data
Presentation
Evaluation in Information Retrieval
PromQL Deep Dive - The Prometheus Query Language
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica t5-database searching
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Hadoop Summit 2010 Machine Learning Using Hadoop
Bioinformatics t5-databasesearching v2014
purrr.pdf
Gwt presen alsip-20111201
Scalable and Adaptive Graph Querying with MapReduce
Approximation Data Structures for Streaming Applications
Frequent Itemset Mining on BigData
Mining of massive datasets
Data transformation and query management in personal health sensor network
Approaches to online quantile estimation
Validating the ChemSpider Open Spectral Database NMR Collection using ACD/Lab...
2015 11-17-programming inr.key
An Effective Rule Miner for Instance Matching in a Web of Data
Ad

More from Samuel Lampa (17)

PDF
SciCommander - Provenance reports for outputs of ad-hoc analyses
PDF
Using Flow-based programming to write tools and workflows for Scientific Comp...
PDF
Linked Data for improved organization of research data
PDF
How to document computational research projects
PDF
Reproducibility in Scientific Data Analysis - BioScience Seminar
PDF
SciPipe - A light-weight workflow library inspired by flow-based programming
PDF
Vagrant, Ansible and Docker - How they fit together for productive flexible d...
PDF
iRODS Rule Language Cheat Sheet
PDF
AddisDev Meetup ii: Golang and Flow-based Programming
ODP
First encounter with Elixir - Some random things
PDF
Profiling go code a beginners tutorial
PDF
Flow based programming an overview
PDF
Python Generators - Talk at PySthlm meetup #15
PDF
The RDFIO Extension - A Status update
PDF
My lightning talk at Go Stockholm meetup Aug 6th 2013
PDF
Hooking up Semantic MediaWiki with external tools via SPARQL
PDF
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
SciCommander - Provenance reports for outputs of ad-hoc analyses
Using Flow-based programming to write tools and workflows for Scientific Comp...
Linked Data for improved organization of research data
How to document computational research projects
Reproducibility in Scientific Data Analysis - BioScience Seminar
SciPipe - A light-weight workflow library inspired by flow-based programming
Vagrant, Ansible and Docker - How they fit together for productive flexible d...
iRODS Rule Language Cheat Sheet
AddisDev Meetup ii: Golang and Flow-based Programming
First encounter with Elixir - Some random things
Profiling go code a beginners tutorial
Flow based programming an overview
Python Generators - Talk at PySthlm meetup #15
The RDFIO Extension - A Status update
My lightning talk at Go Stockholm meetup Aug 6th 2013
Hooking up Semantic MediaWiki with external tools via SPARQL
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
Ad

3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse

  • 1. rd 3 Status report of degree project Integrating SWI-Prolog for semantic reasoning in Bioclipse Samuel Lampa, 2010-04-07 Project blog: http://saml.rilspace.com
  • 2. Research question How do biochemical questions formulated as Prolog queries compare to other solutions available in Bioclipse in terms of speed and expressiveness?
  • 3. Compared Semantic Tools ● Jena ● General RDF querying (via SPARQL) ● Pellet ● OWL-DL Reasoning (via SPARQL) ● General querying via Jena (via SPARQL) ● SWI-Prolog ● Access to RDF triples (both assertion and querying) via the rdf( Subject, Predicate, Object ) method ● Complex wrapper/convenience methods can be built
  • 4. Use Case: NMRShiftDB Interesting use case: Querying NMRShiftDB data ● Characteristics: – Rather shallow RDF graph – Numeric (float value) interval matching
  • 5. NMR Spectrum Similarity Search What to test: Given a spectrum, represented as a list of shift values, find spectra with the same shifts, (allowing Intensity variation within a limit). Shift → “Dereferencing” spectra
  • 6. Example Data <http://pele.farmbio.uu.se/nmrshiftdb/?moleculeId=234> :hasSpectrum <http://pele.farmbio.uu.se/nmrshiftdb/? spectrumId=4735>; :moleculeId "234". <http://pele.farmbio.uu.se/nmrshiftdb/?spectrumId=4735> :hasPeak <http://pele.farmbio.uu.se/nmrshiftdb/?s4735p0>, <http://pele.farmbio.uu.se/nmrshiftdb/?s4735p1>, <http://pele.farmbio.uu.se/nmrshiftdb/?s4735p2>, <http://pele.farmbio.uu.se/nmrshiftdb/?s4735p0> :hasShift "17.6"^^xsd:decimal . <http://pele.farmbio.uu.se/nmrshiftdb/?s4735p1> :hasShift "18.3"^^xsd:decimal . <http://pele.farmbio.uu.se/nmrshiftdb/?s4735p2> :hasShift "22.6"^^xsd:decimal .
  • 7. % Register RDF namespaces, for use in the convenience methods at the end :- rdf_register_ns(nmr, 'http://www.nmrshiftdb.org/onto#'). Prolog code :- rdf_register_ns(xsd, 'http://www.w3.org/2001/XMLSchema#'). find_mol_with_peak_vals_near( SearchShiftVals, Mols ) :- % Pick the Mols in 'Mol', that match the pattern: % list_peak_shifts_of_mol( Mol, MolShiftVals ), contains_list_elems_near( SearchShiftVals, MolShiftVals ) % and collect them in 'Mols'. setof( Mol, ( list_peak_shifts_of_mol( Mol, MolShiftVals ), % A Mol's shift values are collected contains_list_elems_near( SearchShiftVals, MolShiftVals ) ), % and compared against the given SearchShiftVals [Mols|MolTail] ). % In 'Mols', all 'Mol's, for which their shift % values match the SearchShiftVals, are collected. % Given a 'Mol', give it's shiftvalues in list form, in 'ListOfPeaks' list_peak_shifts_of_mol( Mol, ListOfPeaks ) :- has_spectrum( Mol, Spectrum ), findall( ShiftVal, ( has_peak( Spectrum, Peak ), has_shift_val( Peak, ShiftVal ) ), ListOfPeaks ). % Compare two lists to see if list2 has near-matches for each of the values in list1 contains_list_elems_near( [ElemHead|ElemTail], List ) :- member_close_to( ElemHead, List ), ( contains_list_elems_near( ElemTail, List ); ElemTail == [] ). %%%%%%%%%%%%%%%%%%%%%%%% % Recursive construct: % %%%%%%%%%%%%%%%%%%%%%%%% % Test first the end criterion: member_close_to( X, [ Y | Tail ] ) :- closeTo( X, Y ). % but if the above doesn't validate, then recursively continue with the tail of List2: member_close_to( X, [ Y | Tail ] ) :- member_close_to( X, Tail ). % Numerical near-match closeTo( Val1, Val2 ) :- abs(Val1 - Val2) =< 0.3. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Convenience accessory methods % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% has_shift_val( Peak, ShiftVal ) :- rdf( Peak, nmr:hasShift, literal(type(xsd:decimal, ShiftValLiteral))), atom_number_create( ShiftValLiteral, ShiftVal ). has_spectrum( Subject, Predicate ) :- rdf( Subject, nmr:has_spectrum, Predicate). has_peak( Subject, Predicate ) :- rdf( Subject, nmr:has_peak, Predicate). % Wrapper method for the atom_number/2 method which converts atoms (string constants) to number. % The wrapper methods avoids exceptions on empty atoms, instead converting into a zero. atom_number_create( Atom, Number ) :- atom_length( Atom, AtomLength ), AtomLength > 0 -> % IF atom is not empty atom_number( Atom, Number ); % THEN Convert the atom to a numerical value atom_number( '0', Number ). % ELSE Convert to a zero ");
  • 8. PREFIX owl: <http://www.w3.org/2002/07/owl#> SPARQL code PREFIX afn: <http://jena.hpl.hp.com/ARQ/function#> PREFIX fn: <http://www.w3.org/2005/xpath-functions#> PREFIX nmr: <http://www.nmrshiftdb.org/onto#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?s WHERE { ?s nmr:hasPeak [ nmr:hasShift ?s1 ] , [ nmr:hasShift ?s2 ] , [ nmr:hasShift ?s3 ] , [ nmr:hasShift ?s4 ] , [ nmr:hasShift ?s5 ] , [ nmr:hasShift ?s6 ] , [ nmr:hasShift ?s7 ] , [ nmr:hasShift ?s8 ] , [ nmr:hasShift ?s9 ] , [ nmr:hasShift ?s10 ] , [ nmr:hasShift ?s11 ] , [ nmr:hasShift ?s12 ] , [ nmr:hasShift ?s13 ] , [ nmr:hasShift ?s14 ] , [ nmr:hasShift ?s15 ] , [ nmr:hasShift ?s16 ] . FILTER ( fn:abs(?s1 - 17.6) < 0.3 ) . FILTER ( fn:abs(?s2 - 18.3) < 0.3 ) . FILTER ( fn:abs(?s3 - 22.6) < 0.3 ) . FILTER ( fn:abs(?s4 - 26.5) < 0.3 ) . FILTER ( fn:abs(?s5 - 31.7) < 0.3 ) . FILTER ( fn:abs(?s6 - 33.5) < 0.3 ) . FILTER ( fn:abs(?s7 - 33.5) < 0.3 ) . FILTER ( fn:abs(?s8 - 41.8) < 0.3 ) . FILTER ( fn:abs(?s9 - 42.0) < 0.3 ) . FILTER ( fn:abs(?s10 - 42.2) < 0.3 ) . FILTER ( fn:abs(?s11 - 78.34) < 0.3 ) . FILTER ( fn:abs(?s12 - 140.99) < 0.3 ) . FILTER ( fn:abs(?s13 - 158.3) < 0.3 ) . FILTER ( fn:abs(?s14 - 193.4) < 0.3 ) . FILTER ( fn:abs(?s15 - 203.0) < 0.3 ) . FILTER ( fn:abs(?s16 - 0) < 0.3 ) . }
  • 10. “Expressivity”: SPARQL vs Prolog SPARQL PROLOG
  • 11. Prolog predicate taking variables How to change “input parameters”? ● SPARQL: Modify SPARQL query ● Prolog: Change input parameter
  • 12. Observations ● SPARQL ● Fewer lines of code ● Easier to understand the code ● Prolog ● Easier to change input parameters ● Easier to re-use existing logic (call a method rather than cut and paste SPARQL code) ● Easier to change aspects of the execution logic
  • 14. Prolog vs Jena vs JenaTDB vs Pellet
  • 15. Prolog vs Jena vs JenaTDB
  • 16. Observations ● Prolog is the fastest (in-memory only) ● Jena faster with disk based than with in-memory RDF store! ● Pellet with in-memory store is slow ● Pellet with disk based store out of question
  • 17. Project plan from last Planned final presentation: 28 april 2010 (BMC B7:101a) Everybody is welcome!
  • 18. Thank you! Project blog: http://saml.rilspace.com