SlideShare a Scribd company logo
Nov. 23 2010 - S. Fermigier & O. Grisel, Nuxeo




Semantic ECM @ Nuxeo
   A progress report - Nov. 2010
Agenda

From ECM to Semantic ECM
Scribo & IKS
Fise & Apache Stanbol
Nuxeo Integration
Roadmap for 2011
Nuxeo: from ECM...
Nuxeo: an open source
           ECM vendor
Our Focus is Enterprise Content Management
ECM as a Platform for Content Applications
Open Source as Efficient Development Model
Modern architecture for 21st Century business
  “Lean, mobile, social, interoperable”

A Social Marketplace in action
  Innovation driven by community of customers, partners,
  and our core developers
Nuxeo ECM - From Platform to Products

                  Construction               Media               Government              Life Sciences

 Business
 Solutions
                 Correspondence            Contracts                                      Records
                                                             Invoice Processing
                  Management              Management                                     Management




                                                         Case            Structured
 Horizontal       Document        Digital Asset                                              Content
                                                     Management          Document
 Packages        Management       Management
                                                      Framework            Server
                                                                                            Aggregator




                                          Nuxeo Enterprise Platform
                                 Complete set of components covering all aspects of ECM
  Platform
   Content
Infrastructure
                                                     Nuxeo Core
                                  Lightweight, scalable, embeddable content repository
                                              5
Major Customers
... to Semantic ECM
Picture source: http://www.flickr.com/photos/pixelydixel/
Linked Online Data in 2007




“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
2008




“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
2009




“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
2010




“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Good for Enterprise apps too!




Diagram source: http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/
Key Enablers
Open Data and Linked Online Data
Advances in automatic content analysis
(linguistics, image processing) and machine
learning
Classical logic and classical AI
Computing power (Moore’s law +
MapReduce)
The technologies and data
                   are available,

  Let’s put them to use!
Semantic ECM
Semantic ECM

Text
        Sound
Image
        Video


 Content
Semantic ECM

                    Metadata
Text
        Sound     Tags   Entities
Image             Relations
        Video
                    Reasoning


 Content            Meaning
Semantic ECM

                    Metadata
Text
        Sound     Tags   Entities
Image             Relations
        Video
                    Reasoning


 Content            Meaning
Goals for Semantic ECM

Repurpose existing content
Improve search and collaboration
Make information contextual
Extract and use information from your content
Make your content smarter!
Challenges

Extract meaning from content
Enrich content with knowledge
Enhance interaction with content thanks to
added meaning
Architectural Challenge




Content Stack vs. Knowledge Cake
Business value
             from semantic ECM

Efficiency gains: 20% to 90% (ex: in search,
collaboration)
Effectiveness gains: better returns from
your assets (ex: news and images from AFP)
Strategic edge: growth, value capture, new
services, gain unfair strategic advantage (ex:
vertical ontologies for CEVAs / CCAs)
SCRIBO and IKS
Project under the french FUI program, with 9
partners, and a budget of 4.7 M€
Goal: to develop algorithms and collaborative tools
for extracting knowledge from unstructured
documents and images
Started in 2008, finishing in Dec. 2010, with
results already integrated as a Nuxeo plugin
European project under the FP7, with 13
partners (6 SMEs) and a 8.5 M€ budget
Goal: create a semantic software “stack” that will
be used by CMS vendors to add semantic features
to their products
Started in Jan. 2009, will last until Dec. 2012
First tangible result: FISE, already integrated in a
Nuxeo plugin
Linking Semantic Entities
 Apache Stanbol - Nuxeo integration
Demo time!

 Screencast online at http://blogs.nuxeo.com/dev


                                                   25
How does this work?


                      26
27
• Open Source Semantic Engine
• HTTP Services
• For content driven applications
• OSGi: loosely coupled components
• Analysis Engines
• Knowledge RDF vocabularies
                                     28
What is a semantic engine?

• Unstructured content => Knowledge


• Language guessing
• Topic classification (Business, Sports, Media, ...)
• Named Entities extraction and linking
• Relationships and properties extraction

                                                       29
30
31
curl -X POST 
 -H "Accept: application/json" 
 -H "Content-type: text/plain" 
 --data "John Smith works at Smith Consulting in Paris." 
 http://fise.demo.nuxeo.com/engines

{
     "urn:enhancement-1564680b-861c-df6f-fdf9-d34a75d68dfe": {
        "http://fise.iks-project.eu/ontology/selected-text": [
           {
              "datatype": "http://www.w3.org/2001/XMLSchema#string",
              "type": "literal",
              "value": "Paris"
           }
        ],
        "http://fise.iks-project.eu/ontology/selection-context": [
           {
              "datatype": "http://www.w3.org/2001/XMLSchema#string",
              "type": "literal",
              "value": "John Smith works at Smith Consulting Paris."
           }
        ],
        "http://purl.org/dc/terms/type": [
           {
              "type": "uri",
              "value": "http://dbpedia.org/ontology/Place"
           }
        ]                                                              32
     },
    …
33
34
=
            fise
             +
fast Linked Data local index
             +
    semantic rule engine
             +
          more ?               35
Apache Stanbol / Nuxeo
integration


                         36
Apache Stanbol

                                    Engine 1          DBpedia
                                    Engine 2

            2
1                                   Engine 3


                                                      Freebase

    Nuxeo DM
                                   3
        addon
                                                      Geonames
                                               LDAP
            Local IT infrastructure (LAN)                        37
Roadmap 2010-2011
Nuxeo DM Improvement
Automated document categorization
(language, subject, geo coverage based on fixed
lists)
Semantic entities detection and linking


Available as add-ons on the Nuxeo
Marketplace in December!
Nuxeo DM:
        Upcoming Work
Stanbol + Scribo integration
Multilingual support
Extraction of relations between entities
Topic classification and linking to external
taxonomies
Nuxeo DAM

Clustering pictures by similarity
Faces detection
Faces recognition using contextual information
Speech to text integration for full-text search
on audio and video files
Nuxeo CMF /
      Correspondence
Document OCR and structure extraction
Scanned document categorization (ex: invoice
vs. contract vs. claim...) and routing
Structured field extraction with configurable
document masks
Questions?
More info
http://www.nuxeo.com/
http://blogs.nuxeo.com/dev
http://iks-project.eu
http://fise.demo.nuxeo.com
http://scribo.ws
http://incubator.apache.org/stanbol

More Related Content

PDF
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
KEY
Introduction to the Semantic Web
PDF
100827 ting.concept suhf - stockholm
PPTX
Controlled vocabularies and ontologies in Dataverse data repository
 
PDF
CLARIAH Toogdag 2018: A distributed network of digital heritage information
PPTX
A LASSO for Linked Data
PDF
Cytoscape Untangles the Web: a first step towards Cytoscape Cyberinfrastructu...
PPT
Overview AG AKSW
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Introduction to the Semantic Web
100827 ting.concept suhf - stockholm
Controlled vocabularies and ontologies in Dataverse data repository
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
A LASSO for Linked Data
Cytoscape Untangles the Web: a first step towards Cytoscape Cyberinfrastructu...
Overview AG AKSW

Similar to Nuxeo Semantic ECM: from Scribo and Stanbol to valuable applications (20)

KEY
Nuxeo EP 5 2
PDF
Trekk Cross-Media Series: Using XML to Create Once - Distribute Everywhere
ODP
Nuxeo Enterprise Platform (Nuxeo EP) - Technical Overview
PPTX
Linked services for the Web of Data
PPTX
CNCF Introduction - Feb 2018
KEY
Nuxeo at 10
PPTX
Architecture as Linked Data
PDF
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
PDF
Azure Meetup: Novità CosmosDB modalità Serverless e Cognitive Services
PDF
CNCF, State of Serverless & Project Nuclio
PDF
Nuxeo Fact Sheet
PDF
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
PDF
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
PDF
20160629 Habitat Introduction: Austin DevOps/Mesos User Group
PDF
Nuxeo Open Source ECM, OW2con 11, Nov 24-25, Paris
 
PDF
Nuxeo, an open source platform for content-centric business applications
PDF
Roman Pavlyuk, Yaroslav Ravlinko, Intellias. Enterprise IT Transformation and...
PDF
Semantic technologies in practice - KULeuven 2016
PDF
Faites évoluer votre accès aux données avec MongoDB Stitch
Nuxeo EP 5 2
Trekk Cross-Media Series: Using XML to Create Once - Distribute Everywhere
Nuxeo Enterprise Platform (Nuxeo EP) - Technical Overview
Linked services for the Web of Data
CNCF Introduction - Feb 2018
Nuxeo at 10
Architecture as Linked Data
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
Azure Meetup: Novità CosmosDB modalità Serverless e Cognitive Services
CNCF, State of Serverless & Project Nuclio
Nuxeo Fact Sheet
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
20160629 Habitat Introduction: Austin DevOps/Mesos User Group
Nuxeo Open Source ECM, OW2con 11, Nov 24-25, Paris
 
Nuxeo, an open source platform for content-centric business applications
Roman Pavlyuk, Yaroslav Ravlinko, Intellias. Enterprise IT Transformation and...
Semantic technologies in practice - KULeuven 2016
Faites évoluer votre accès aux données avec MongoDB Stitch
Ad

More from Nuxeo (20)

PDF
Own the Digital Shelf Strategies Food and Beverage Companies
PDF
How DAM Librarians Can Get Ready for the Uncertain Future
PDF
How Insurers Fueled Transformation During a Pandemic
PDF
Manage your Content at Scale with MongoDB and Nuxeo
PDF
Accelerate the Digital Supply Chain From Idea to Support
PDF
Where are you in the DAM Continuum
PDF
Customer Experience in 2021
PPTX
L’IA personnalisée, clé d’une gestion de l’information innovante
PDF
Gérer ses contenus avec MongoDB et Nuxeo
PPTX
Le DAM en 2021 : Tendances, points clés et critères d'évaluation
PPTX
Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...
PDF
Elevate your Customer's Experience and Stay Ahead of the Competition
PDF
Driving Brand Loyalty Through Superior Customer Experience
PDF
Drive Enterprise Speed and Scale with A Cloud-Native DAM
PPTX
The Big Picture: the Role of Video, Photography, and Content in Enhancing the...
PDF
How Creatives Are Getting Creative in 2020 and Beyond
PPTX
Digitalisation : Améliorez la collaboration et l’expérience client grâce au DAM
PDF
Reimagine Your Claims Process with Future-Proof Technologies
PPTX
Comment le Centre Hospitalier Laborit dématérialise ses processus administratifs
PDF
Accelerating the Packaging Design Process with Artificial Intelligence
Own the Digital Shelf Strategies Food and Beverage Companies
How DAM Librarians Can Get Ready for the Uncertain Future
How Insurers Fueled Transformation During a Pandemic
Manage your Content at Scale with MongoDB and Nuxeo
Accelerate the Digital Supply Chain From Idea to Support
Where are you in the DAM Continuum
Customer Experience in 2021
L’IA personnalisée, clé d’une gestion de l’information innovante
Gérer ses contenus avec MongoDB et Nuxeo
Le DAM en 2021 : Tendances, points clés et critères d'évaluation
Enabling Digital Transformation Amidst a Global Pandemic | Low-Code, Cloud, A...
Elevate your Customer's Experience and Stay Ahead of the Competition
Driving Brand Loyalty Through Superior Customer Experience
Drive Enterprise Speed and Scale with A Cloud-Native DAM
The Big Picture: the Role of Video, Photography, and Content in Enhancing the...
How Creatives Are Getting Creative in 2020 and Beyond
Digitalisation : Améliorez la collaboration et l’expérience client grâce au DAM
Reimagine Your Claims Process with Future-Proof Technologies
Comment le Centre Hospitalier Laborit dématérialise ses processus administratifs
Accelerating the Packaging Design Process with Artificial Intelligence
Ad

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Empathic Computing: Creating Shared Understanding
PPTX
MYSQL Presentation for SQL database connectivity
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Machine learning based COVID-19 study performance prediction
PDF
Electronic commerce courselecture one. Pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
cuic standard and advanced reporting.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Dropbox Q2 2025 Financial Results & Investor Presentation
Empathic Computing: Creating Shared Understanding
MYSQL Presentation for SQL database connectivity
The AUB Centre for AI in Media Proposal.docx
Spectral efficient network and resource selection model in 5G networks
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25 Week I
sap open course for s4hana steps from ECC to s4
Machine learning based COVID-19 study performance prediction
Electronic commerce courselecture one. Pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Diabetes mellitus diagnosis method based random forest with bat algorithm

Nuxeo Semantic ECM: from Scribo and Stanbol to valuable applications

  • 1. Nov. 23 2010 - S. Fermigier & O. Grisel, Nuxeo Semantic ECM @ Nuxeo A progress report - Nov. 2010
  • 2. Agenda From ECM to Semantic ECM Scribo & IKS Fise & Apache Stanbol Nuxeo Integration Roadmap for 2011
  • 4. Nuxeo: an open source ECM vendor Our Focus is Enterprise Content Management ECM as a Platform for Content Applications Open Source as Efficient Development Model Modern architecture for 21st Century business “Lean, mobile, social, interoperable” A Social Marketplace in action Innovation driven by community of customers, partners, and our core developers
  • 5. Nuxeo ECM - From Platform to Products Construction Media Government Life Sciences Business Solutions Correspondence Contracts Records Invoice Processing Management Management Management Case Structured Horizontal Document Digital Asset Content Management Document Packages Management Management Framework Server Aggregator Nuxeo Enterprise Platform Complete set of components covering all aspects of ECM Platform Content Infrastructure Nuxeo Core Lightweight, scalable, embeddable content repository 5
  • 9. Linked Online Data in 2007 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
  • 10. 2008 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
  • 11. 2009 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
  • 12. 2010 “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
  • 13. Good for Enterprise apps too! Diagram source: http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/
  • 14. Key Enablers Open Data and Linked Online Data Advances in automatic content analysis (linguistics, image processing) and machine learning Classical logic and classical AI Computing power (Moore’s law + MapReduce)
  • 15. The technologies and data are available, Let’s put them to use!
  • 17. Semantic ECM Text Sound Image Video Content
  • 18. Semantic ECM Metadata Text Sound Tags Entities Image Relations Video Reasoning Content Meaning
  • 19. Semantic ECM Metadata Text Sound Tags Entities Image Relations Video Reasoning Content Meaning
  • 20. Goals for Semantic ECM Repurpose existing content Improve search and collaboration Make information contextual Extract and use information from your content Make your content smarter!
  • 21. Challenges Extract meaning from content Enrich content with knowledge Enhance interaction with content thanks to added meaning
  • 23. Business value from semantic ECM Efficiency gains: 20% to 90% (ex: in search, collaboration) Effectiveness gains: better returns from your assets (ex: news and images from AFP) Strategic edge: growth, value capture, new services, gain unfair strategic advantage (ex: vertical ontologies for CEVAs / CCAs)
  • 25. Project under the french FUI program, with 9 partners, and a budget of 4.7 M€ Goal: to develop algorithms and collaborative tools for extracting knowledge from unstructured documents and images Started in 2008, finishing in Dec. 2010, with results already integrated as a Nuxeo plugin
  • 26. European project under the FP7, with 13 partners (6 SMEs) and a 8.5 M€ budget Goal: create a semantic software “stack” that will be used by CMS vendors to add semantic features to their products Started in Jan. 2009, will last until Dec. 2012 First tangible result: FISE, already integrated in a Nuxeo plugin
  • 27. Linking Semantic Entities Apache Stanbol - Nuxeo integration
  • 28. Demo time! Screencast online at http://blogs.nuxeo.com/dev 25
  • 29. How does this work? 26
  • 30. 27
  • 31. • Open Source Semantic Engine • HTTP Services • For content driven applications • OSGi: loosely coupled components • Analysis Engines • Knowledge RDF vocabularies 28
  • 32. What is a semantic engine? • Unstructured content => Knowledge • Language guessing • Topic classification (Business, Sports, Media, ...) • Named Entities extraction and linking • Relationships and properties extraction 29
  • 33. 30
  • 34. 31
  • 35. curl -X POST -H "Accept: application/json" -H "Content-type: text/plain"  --data "John Smith works at Smith Consulting in Paris."  http://fise.demo.nuxeo.com/engines { "urn:enhancement-1564680b-861c-df6f-fdf9-d34a75d68dfe": { "http://fise.iks-project.eu/ontology/selected-text": [ { "datatype": "http://www.w3.org/2001/XMLSchema#string", "type": "literal", "value": "Paris" } ], "http://fise.iks-project.eu/ontology/selection-context": [ { "datatype": "http://www.w3.org/2001/XMLSchema#string", "type": "literal", "value": "John Smith works at Smith Consulting Paris." } ], "http://purl.org/dc/terms/type": [ { "type": "uri", "value": "http://dbpedia.org/ontology/Place" } ] 32 }, …
  • 36. 33
  • 37. 34
  • 38. = fise + fast Linked Data local index + semantic rule engine + more ? 35
  • 39. Apache Stanbol / Nuxeo integration 36
  • 40. Apache Stanbol Engine 1 DBpedia Engine 2 2 1 Engine 3 Freebase Nuxeo DM 3 addon Geonames LDAP Local IT infrastructure (LAN) 37
  • 42. Nuxeo DM Improvement Automated document categorization (language, subject, geo coverage based on fixed lists) Semantic entities detection and linking Available as add-ons on the Nuxeo Marketplace in December!
  • 43. Nuxeo DM: Upcoming Work Stanbol + Scribo integration Multilingual support Extraction of relations between entities Topic classification and linking to external taxonomies
  • 44. Nuxeo DAM Clustering pictures by similarity Faces detection Faces recognition using contextual information Speech to text integration for full-text search on audio and video files
  • 45. Nuxeo CMF / Correspondence Document OCR and structure extraction Scanned document categorization (ex: invoice vs. contract vs. claim...) and routing Structured field extraction with configurable document masks