SlideShare a Scribd company logo
Gabriel Dragomir




          Drupal and Apache Stanbol
          What if you could reliably do autotagging?




Wednesday, January 23, 13
Semantic content is the key!

              Most organizations need to organize/analyze/relate
              huge amounts of textual, unstructured, dissipated data
              E.g. universities check theses for plagiarism
              SNSPA: we adapted WebFerret plagiarism checker for
              Romanian
              http://homepages.stca.herts.ac.uk/~pdgroup/



Wednesday, January 23, 13
Semantic content is the key!

              Web Ferret - indentifies potential sources from the
              Internet and from an institutional repository
              CONS:
                    Desktop based, no REST web services
                    Cannot detect plagiarism by translation




Wednesday, January 23, 13
Semantic content is the key!

              Here comes Apache Stanbol
              A new approach:
                    semantic analysis of documents
                    extract citations in proximity
                    search the web for documents with a similar citation
                    structure



Wednesday, January 23, 13
From IKS to Apache Stanbol
              IKS - Interactive Knowledge Stack for small to medium
              CMS providers - EU funding
              An open source software stack written in Java
              Goal: extract and process semantic data from
              documents
              Project undergoing incubation at Apache Foundation
              http://stanbol.apache.org



Wednesday, January 23, 13
Service oriented architecture
              Stanbol is designed to offer service oriented integration
              RESTful web service API returning RDF or JSON/
              JSON-LD
              Each component exposes an endpoint independently
              Open Services Gateway initiative compliant (OSGi) via
              Apache Felix and Apache Sling
              Remote component management



Wednesday, January 23, 13
Implementation

              OSGi layer: Apache Felix and Apache Sling
              Build environment: Apache Maven
              RDF framework: Apache Clerezza
              Triples store, reasoning engine: Apache Jena
              Indexing and semantic search: Apache Solr
              Content analysis/metadata extraction: Apache Tika
              Natural language processing: Apache OpenNLP

Wednesday, January 23, 13
Architecture




Wednesday, January 23, 13
Components
              Semantic layer:
                    Enhancer, EntityHub, ContentHub
                    Enhancement engines: internal, 3rd party
              User interfaces
              Knowledge integration
              Storage integration



Wednesday, January 23, 13
Content enhancement
              Examples:
                    retrive additional metadata for a piece of content
                    identify the language of a text
                    extract entities (persons, places, organizations)
                    create annotations to external sources
                    use 3rd party services for named entities recognition



Wednesday, January 23, 13
Drupal meets Stanbol


              Drupal supports RDFa allowing semantic annotations
              Taxonomy system allows for complex annotation
              Fieldable taxonomy terms allow for storage of complex
              semantic data




Wednesday, January 23, 13
User scenarios

              Assisted semantic tagging: autotagging
              Content enrichment with semantically related
              information (documents, factual data, images etc.)
              Tag as you type: dynamic annotation of text in editors
              Autocomplete indexes - FAST with Apache Solr




Wednesday, January 23, 13
Autotagging with Stanbol
              Given a piece of content extract mentions of places,
              persons, organizations or other entities
              Named entity recognition (NER)
              OpenCalais and Zemanta provide similar functionality,
              limited free reqs, limited languages
              Stanbol does it for free
              Multilingual: may be trained for any language



Wednesday, January 23, 13
How it works
              REST service: Apache Stanbol Enhancer
              Returns JSON-LD, RDF/XML, RDF/JSON etc

         curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" 
              --data "The Stanbol enhancer can detect famous cities such as 
                       Paris and people such as Barack Obama." http://dev.iks-project.eu:
         8081/enhancer

              JSON-LD - JavaScript Object Notation for Linked Data
              a human readable and simple linked data transport
              format



Wednesday, January 23, 13
How it works

              JSON-LD: is included in Drupal 8 core
              Creates a description of the data as a “context” data
              structure
              Context: links object properties to concepts in an
              ontology
              Allows for values to be coerced to a certain set or
              language



Wednesday, January 23, 13
How it works
          {
               "@context": {
                  "name": "http://xmlns.com/foaf/0.1/name",
                  "homepage": {
                     "@id": "http://xmlns.com/foaf/0.1/workplaceHomepage",
                     "@type": "@id"
                  },
                  "person": "http://xmlns.com/foaf/0.1/Person"
               },
               "@id": "http://www.barackobama.com",
               "@type": "person",
               "name": "Barack Obama",
               "homepage": "http://www.whitehouse.gov/"
          }




Wednesday, January 23, 13
How it works
          {
               "@context": {
                  "name": "http://xmlns.com/foaf/0.1/name",
                  "homepage": {
                     "@id": "http://xmlns.com/foaf/0.1/workplaceHomepage",
                     "@type": "@id"
                  },
                  "person": "http://xmlns.com/foaf/0.1/Person"
               },
               "@id": "http://www.barackobama.com",
               "@type": "person",
               "name": "Barack Obama",
               "homepage": "http://www.whitehouse.gov/"
          }

       FOAF: “Friend of a friend” - RDF ontology
       describing people, their relations and activities
Wednesday, January 23, 13
{
       "@context": {
         (...)
         "foaf": "http://xmlns.com/foaf/0.1/",
         (...)
       "@subject": [
         {
            "@subject": "http://dbpedia.org/resource/Barack_Obama",
            "@type": [
               "dbp-ont:OfficeHolder",
               "dbp-ont:Person",
               "foaf:Person",
               "owl:Thing"
            ],
           (...)
            "foaf:depiction": [
               "http://upload.wikimedia.org/wikipedia/en/e/e9/
     Official_portrait_of_Barack_Obama.jpg",
               "http://upload.wikimedia.org/wikipedia/en/thumb/e/e9/
     Official_portrait_of_Barack_Obama.jpg/200px-Official_portrait_of_Barack_Obama.jpg"
            ],
            "foaf:homepage": [
               "http://www.whitehouse.gov/",
               "http://www.barackobama.com/"
            ],




Wednesday, January 23, 13
How it works




                            Source: blog.iks-project.eu



Wednesday, January 23, 13
How it works

              On Drupal side we only have to parse the response
              Map JSON-LD properties to entity fields
              Use Drupal’s native RDFa capability to render semantic
              markup
              Use your imagination and build semantic content




Wednesday, January 23, 13
Quick demo

              Semantic CMS - Evo42 communications, early adopter
              integration of Drupal with Stanbol
              Rene Kapusta - https://github.com/evo42/Semantic-
              CMS
              Drupal contributor, Aloha Editor core developer




Wednesday, January 23, 13

More Related Content

PDF
Drupal and Apache Stanbol
PDF
Linked data based semantic annotation using Drupal and Apache Stanbol
PPT
Apache Stanbol 
and the Web of Data - ApacheCon 2011
PPTX
Stanbol
PDF
Linked Media Management with Apache Marmotta
ODP
Apache Marmotta - Introduction
PDF
Enabling access to Linked Media with SPARQL-MM
PDF
Adventures in Linked Data Land (presentation by Richard Light)
Drupal and Apache Stanbol
Linked data based semantic annotation using Drupal and Apache Stanbol
Apache Stanbol 
and the Web of Data - ApacheCon 2011
Stanbol
Linked Media Management with Apache Marmotta
Apache Marmotta - Introduction
Enabling access to Linked Media with SPARQL-MM
Adventures in Linked Data Land (presentation by Richard Light)

What's hot (20)

ODP
Linked Media and Data Using Apache Marmotta
PDF
Semantic Media Management with Apache Marmotta
PDF
Webinar: Semantic web for developers
PPTX
Usage of Linked Data: Introduction and Application Scenarios
ODP
Building a semantic website
PDF
Lab swe-2013intro jax-rs
PPTX
RDFa Tutorial
PDF
Introduction to LDP in Apache Marmotta
PPT
ORE and SWAP: Composition and Complexity
PDF
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
PDF
Culture Geeks Feb talk: Adventures in Linked Data Land
PPT
Semantic Web
PPT
Realizing a Semantic Web Application - ICWE 2010 Tutorial
PDF
Web of Data Usage Mining
PPTX
Saveface - Save your Facebook content as RDF data
PDF
Querying Linked Data with SPARQL (2010)
PPTX
Madrid SPARQL handson
PPTX
Facilitating the discovery of public datasets
PPT
Talis Platform: A Linked Data Engine
PPT
A Semantic Data Model for Web Applications
Linked Media and Data Using Apache Marmotta
Semantic Media Management with Apache Marmotta
Webinar: Semantic web for developers
Usage of Linked Data: Introduction and Application Scenarios
Building a semantic website
Lab swe-2013intro jax-rs
RDFa Tutorial
Introduction to LDP in Apache Marmotta
ORE and SWAP: Composition and Complexity
Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO
Culture Geeks Feb talk: Adventures in Linked Data Land
Semantic Web
Realizing a Semantic Web Application - ICWE 2010 Tutorial
Web of Data Usage Mining
Saveface - Save your Facebook content as RDF data
Querying Linked Data with SPARQL (2010)
Madrid SPARQL handson
Facilitating the discovery of public datasets
Talis Platform: A Linked Data Engine
A Semantic Data Model for Web Applications
Ad

Similar to Drupal and Apache Stanbol. What if you could reliably do autotagging? (20)

PDF
elasticsearch
PPT
Simple Knowledge Organization System (SKOS) in the Context of Semantic Web De...
ODP
Riding the Semantic Web
ODP
State of the Semantic Web
PPT
DM110 - Week 10 - Semantic Web / Web 3.0
PDF
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)
PPT
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
PDF
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
PPT
Adding Meaning To Your Data
PPT
Open Access Publishing on the Semantic Web
PPT
SemanticWeb Nuts 'n Bolts
PDF
Publishing and Using Linked Data
PDF
Web Technologies (8/12): XML & HTML Data Processing. Simple API for XML. Simp...
PPTX
Linked data and rdf
PPT
The scripting library: Combining data and information in the library
PPT
Exploring and using the Semantic Web - SSSW09 tutorial
PPT
Lee Iverson - How does the web connect content?
ODP
Web of data
PPT
Apachecon 2011 stanbol_ogrisel
PPTX
Linked Data and Locah, UKSG2011
elasticsearch
Simple Knowledge Organization System (SKOS) in the Context of Semantic Web De...
Riding the Semantic Web
State of the Semantic Web
DM110 - Week 10 - Semantic Web / Web 3.0
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
Adding Meaning To Your Data
Open Access Publishing on the Semantic Web
SemanticWeb Nuts 'n Bolts
Publishing and Using Linked Data
Web Technologies (8/12): XML & HTML Data Processing. Simple API for XML. Simp...
Linked data and rdf
The scripting library: Combining data and information in the library
Exploring and using the Semantic Web - SSSW09 tutorial
Lee Iverson - How does the web connect content?
Web of data
Apachecon 2011 stanbol_ogrisel
Linked Data and Locah, UKSG2011
Ad

Drupal and Apache Stanbol. What if you could reliably do autotagging?

  • 1. Gabriel Dragomir Drupal and Apache Stanbol What if you could reliably do autotagging? Wednesday, January 23, 13
  • 2. Semantic content is the key! Most organizations need to organize/analyze/relate huge amounts of textual, unstructured, dissipated data E.g. universities check theses for plagiarism SNSPA: we adapted WebFerret plagiarism checker for Romanian http://homepages.stca.herts.ac.uk/~pdgroup/ Wednesday, January 23, 13
  • 3. Semantic content is the key! Web Ferret - indentifies potential sources from the Internet and from an institutional repository CONS: Desktop based, no REST web services Cannot detect plagiarism by translation Wednesday, January 23, 13
  • 4. Semantic content is the key! Here comes Apache Stanbol A new approach: semantic analysis of documents extract citations in proximity search the web for documents with a similar citation structure Wednesday, January 23, 13
  • 5. From IKS to Apache Stanbol IKS - Interactive Knowledge Stack for small to medium CMS providers - EU funding An open source software stack written in Java Goal: extract and process semantic data from documents Project undergoing incubation at Apache Foundation http://stanbol.apache.org Wednesday, January 23, 13
  • 6. Service oriented architecture Stanbol is designed to offer service oriented integration RESTful web service API returning RDF or JSON/ JSON-LD Each component exposes an endpoint independently Open Services Gateway initiative compliant (OSGi) via Apache Felix and Apache Sling Remote component management Wednesday, January 23, 13
  • 7. Implementation OSGi layer: Apache Felix and Apache Sling Build environment: Apache Maven RDF framework: Apache Clerezza Triples store, reasoning engine: Apache Jena Indexing and semantic search: Apache Solr Content analysis/metadata extraction: Apache Tika Natural language processing: Apache OpenNLP Wednesday, January 23, 13
  • 9. Components Semantic layer: Enhancer, EntityHub, ContentHub Enhancement engines: internal, 3rd party User interfaces Knowledge integration Storage integration Wednesday, January 23, 13
  • 10. Content enhancement Examples: retrive additional metadata for a piece of content identify the language of a text extract entities (persons, places, organizations) create annotations to external sources use 3rd party services for named entities recognition Wednesday, January 23, 13
  • 11. Drupal meets Stanbol Drupal supports RDFa allowing semantic annotations Taxonomy system allows for complex annotation Fieldable taxonomy terms allow for storage of complex semantic data Wednesday, January 23, 13
  • 12. User scenarios Assisted semantic tagging: autotagging Content enrichment with semantically related information (documents, factual data, images etc.) Tag as you type: dynamic annotation of text in editors Autocomplete indexes - FAST with Apache Solr Wednesday, January 23, 13
  • 13. Autotagging with Stanbol Given a piece of content extract mentions of places, persons, organizations or other entities Named entity recognition (NER) OpenCalais and Zemanta provide similar functionality, limited free reqs, limited languages Stanbol does it for free Multilingual: may be trained for any language Wednesday, January 23, 13
  • 14. How it works REST service: Apache Stanbol Enhancer Returns JSON-LD, RDF/XML, RDF/JSON etc curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" --data "The Stanbol enhancer can detect famous cities such as Paris and people such as Barack Obama." http://dev.iks-project.eu: 8081/enhancer JSON-LD - JavaScript Object Notation for Linked Data a human readable and simple linked data transport format Wednesday, January 23, 13
  • 15. How it works JSON-LD: is included in Drupal 8 core Creates a description of the data as a “context” data structure Context: links object properties to concepts in an ontology Allows for values to be coerced to a certain set or language Wednesday, January 23, 13
  • 16. How it works { "@context": { "name": "http://xmlns.com/foaf/0.1/name", "homepage": { "@id": "http://xmlns.com/foaf/0.1/workplaceHomepage", "@type": "@id" }, "person": "http://xmlns.com/foaf/0.1/Person" }, "@id": "http://www.barackobama.com", "@type": "person", "name": "Barack Obama", "homepage": "http://www.whitehouse.gov/" } Wednesday, January 23, 13
  • 17. How it works { "@context": { "name": "http://xmlns.com/foaf/0.1/name", "homepage": { "@id": "http://xmlns.com/foaf/0.1/workplaceHomepage", "@type": "@id" }, "person": "http://xmlns.com/foaf/0.1/Person" }, "@id": "http://www.barackobama.com", "@type": "person", "name": "Barack Obama", "homepage": "http://www.whitehouse.gov/" } FOAF: “Friend of a friend” - RDF ontology describing people, their relations and activities Wednesday, January 23, 13
  • 18. { "@context": { (...) "foaf": "http://xmlns.com/foaf/0.1/", (...) "@subject": [ { "@subject": "http://dbpedia.org/resource/Barack_Obama", "@type": [ "dbp-ont:OfficeHolder", "dbp-ont:Person", "foaf:Person", "owl:Thing" ], (...) "foaf:depiction": [ "http://upload.wikimedia.org/wikipedia/en/e/e9/ Official_portrait_of_Barack_Obama.jpg", "http://upload.wikimedia.org/wikipedia/en/thumb/e/e9/ Official_portrait_of_Barack_Obama.jpg/200px-Official_portrait_of_Barack_Obama.jpg" ], "foaf:homepage": [ "http://www.whitehouse.gov/", "http://www.barackobama.com/" ], Wednesday, January 23, 13
  • 19. How it works Source: blog.iks-project.eu Wednesday, January 23, 13
  • 20. How it works On Drupal side we only have to parse the response Map JSON-LD properties to entity fields Use Drupal’s native RDFa capability to render semantic markup Use your imagination and build semantic content Wednesday, January 23, 13
  • 21. Quick demo Semantic CMS - Evo42 communications, early adopter integration of Drupal with Stanbol Rene Kapusta - https://github.com/evo42/Semantic- CMS Drupal contributor, Aloha Editor core developer Wednesday, January 23, 13