SlideShare a Scribd company logo
8
Most read
9
Most read
10
Most read
Apache ManifoldCF
Overview

● The story
● What is ManifoldCF?
● Why ManifoldCF?
● Architecture
● The 0.3-incubating version
● The 0.4-incubating version
● What's new in the 0.5-incubating
● The book: ManifoldCF in Action
● Demo
● Resources
The story

The original ManifoldCF code base was granted by MetaCarta Inc.,
to the Apache Software Foundation in December 2009.

The MetaCarta effort represented more than five years of successful
development and testing in multiple, challenging enterprise
environments.

The project is in the Apache Incubator because the community was
not yet diverse enough, but now the project is towards graduation.
                                 ^__^
What is ManifoldCF?
● Open Source crawler
   ○ schedule jobs to create indexes
      ■ get contents from repositories
      ■ push contents on search servers
What is ManifoldCF?
● Open Source crawler
   ○ schedule jobs to create indexes
      ■ get contents from repositories
      ■ push contents on search servers

● Out-Of-The-Box it is distributed as J2EE web apps
   ○ REST API
   ○ Authority Service
   ○ Crawler UI

● Can be embedded in any Java application
Why ManifoldCF?
● Reliability
● Incremental
● Multi repositories
● Security model
● Monitoring
Why ManifoldCF? - Reliability

Jobs scheduling and configuration are stored in the database
to maintain the state of all the executions
Why ManifoldCF? - Incremental

Jobs can be optionally configured to re-visit contents
incrementally
Why ManifoldCF? - Multi repositories

Jobs can retrieve contents from the following repositories:
 ● CMIS-compliant
 ● Alfresco
 ● IBM FileNet
 ● EMC Documentum
 ● Microsoft SharePoint
 ● OpenText LiveLink
 ● Autonomy Meridio
 ● Memex Patriarch
 ● Windows Share/DFS
 ● Generic JDBC
 ● Generic Filesystem
 ● Generic RSS and Web
Why ManifoldCF? - Multi repositories

Jobs can ingest contents to the following search servers:
 ● ElasticSearch
 ● OpenSearchServer
 ● Apache Solr
 ● MetaCarta GTS
Why ManifoldCF? - Security model

Retrieve per-content ACLs
Why ManifoldCF? - Monitoring

UI Crawler allows you to:
 ● configure jobs and connectors
 ● monitor jobs execution
 ● monitor contents ingestion
    ○ status reports
        ■ document status
        ■ queue status
    ○ history reports
        ■ simple history
        ■ maximum activity
        ■ maximum bandwidth
        ■ result histogram
Architecture

● Pull Agent Daemon
   ○ Jobs
       ■ Repository Connectors
       ■ Output Connectors
       ■ Authority Connectors
Architecture

● Pull Agent Daemon (the core service)
   ○ Jobs (execute the ingestion tasks)
       ■ Repository Connectors (retrieve contents)
       ■ Output Connectors (ingest contents)
       ■ Authority Connectors (retrieve ACLs)
Architecture
Architecture - Job

A job is an ingestion work that consists of:
     ○ verbal description
     ○ repository connection
         ■ authority connection (optional)
     ○ metadata mapping
     ○ output connection (search server)
     ○ crawling model
     ○ scheduling information (on demand or time ranges)
Architecture - Job
The 0.3-incubating version

● CMIS Repository Connector
● OpenSearchServer Output Connector
● Scripting Language
● New Maven build process
● Several bug fixes
The 0.4-incubating version

● Alfresco Connector
● JDBC Connector now supports MySQL
● CMIS Connector upgraded to OpenCMIS 0.5.0
● Several bug fixes
What's new in the 0.5-incubating

● Apache Velocity for connectors UI templates
● ElasticSearch Output Connector
● CMIS Connector upgraded to OpenCMIS 0.6.0
● Prebuild connector support: just add jars and go!
● New Japanese localization
● Several bug fixes
The book: ManifoldCF in Action

ManifoldCF in Action
by Karl Wright
published by Manning


Karl is the original developer and the
principal committer of Apache ManifoldCF


The book is available at the following site:
http://www.manning.com/wright
DEMO
Resources


Homepage:
http://incubator.apache.org/connectors



Download page:
http://incubator.apache.org/connectors/download.html
Thank you for your attention!

More Related Content

PPTX
Ayodhya Dispute & Verdict
PPTX
Kerajaan kota kapur (new)
PPTX
Lumbini inscription.pptx
PPTX
Piprahawa relic casket.pptx
PPTX
Kolonialisme dan imperialisme barat di indonesia
PPTX
PDF
Presentation gandhara civilization.pdf
PPTX
Kharosthi script.pptx
Ayodhya Dispute & Verdict
Kerajaan kota kapur (new)
Lumbini inscription.pptx
Piprahawa relic casket.pptx
Kolonialisme dan imperialisme barat di indonesia
Presentation gandhara civilization.pdf
Kharosthi script.pptx

What's hot (10)

DOCX
Makalah Dinasti chin
PDF
Hindu Shahis of Kabul and Punjab
PPTX
哥寫的不是程式,是軟體 - 從嵌入式系統看軟體工程全貌
PPTX
Sejarah DI/TII
PPTX
Kerajaan buleleng
PPTX
letak, peninggalan, kehidupan kerajaan Kota kapur
PDF
Cholas.pdf
PPTX
Punch Marked Coins (1000 BCE - 500 CE)
PPTX
X: Kerajaan Banten
PPT
Establishment of muslim rule in bengal
Makalah Dinasti chin
Hindu Shahis of Kabul and Punjab
哥寫的不是程式,是軟體 - 從嵌入式系統看軟體工程全貌
Sejarah DI/TII
Kerajaan buleleng
letak, peninggalan, kehidupan kerajaan Kota kapur
Cholas.pdf
Punch Marked Coins (1000 BCE - 500 CE)
X: Kerajaan Banten
Establishment of muslim rule in bengal
Ad

Viewers also liked (7)

PPT
Apache ManifoldCF
PPTX
Integrate ManifoldCF with Solr
PPTX
Super Size Your Search
PPTX
Integrating Alfresco with Portals
PDF
A Novel methodology for handling Document Level Security in Search Based Appl...
ODP
Web scraping with nutch solr
PPTX
Apache Solr-Webinar
Apache ManifoldCF
Integrate ManifoldCF with Solr
Super Size Your Search
Integrating Alfresco with Portals
A Novel methodology for handling Document Level Security in Search Based Appl...
Web scraping with nutch solr
Apache Solr-Webinar
Ad

Similar to Apache ManifoldCF (20)

PDF
Alfresco WebScript Connector for Apache ManifoldCF
PDF
Apache ManifoldCF @ Linux Day 2012
PDF
Smart Content Migration using Apache ManifoldCF
PDF
Solr and ManifoldCF
PPT
Monitoring IAAS & PAAS Solutions
PPT
Naveen nimmu sdn future of networking
PPT
Naveen nimmu sdn future of networking
PDF
Open Source vs. Open Standards by Sage Weil
PDF
Datacenter Computing with Apache Mesos - BigData DC
PDF
BP-8 Global Federation and Search
PDF
Large-Scale Data Storage and Processing for Scientists with Hadoop
PDF
Pluribus SDN Technology
PPTX
2015 09 emc lsug
PDF
The Network The Next Frontier for Devops ?
PPTX
Introduction to Cloud Data Center and Network Issues
PPT
20120524 cern data centre evolution v2
PDF
Netflix conductor
PPTX
Alfresco cmis
PDF
Microservices and APIs
PDF
Devopsdays State of the Union Amsterdam 2014
Alfresco WebScript Connector for Apache ManifoldCF
Apache ManifoldCF @ Linux Day 2012
Smart Content Migration using Apache ManifoldCF
Solr and ManifoldCF
Monitoring IAAS & PAAS Solutions
Naveen nimmu sdn future of networking
Naveen nimmu sdn future of networking
Open Source vs. Open Standards by Sage Weil
Datacenter Computing with Apache Mesos - BigData DC
BP-8 Global Federation and Search
Large-Scale Data Storage and Processing for Scientists with Hadoop
Pluribus SDN Technology
2015 09 emc lsug
The Network The Next Frontier for Devops ?
Introduction to Cloud Data Center and Network Issues
20120524 cern data centre evolution v2
Netflix conductor
Alfresco cmis
Microservices and APIs
Devopsdays State of the Union Amsterdam 2014

More from Piergiorgio Lucidi (13)

PDF
Embracing InnerSource for your adaptive Digital Transformation
PDF
Introducing the ASF at Microsoft Build 2020 - Italian Dev Community
PDF
Smart Alfresco ECM Program Strategy for Your New Success Story
PDF
Design your own BPM Program Strategy with Alfresco Process Services
PDF
Alfresco Process Services Live Demo @ Red Hat Open Source Day 2017 Italy
PDF
The Journey of Apache ManifoldCF: Learning from ASF's Successes
PPT
Implementing portlets using Web Scripts
PPTX
Alfresco Day Roma 2015 - Sourcesense
PPTX
Alfresco Summit 2014 - Crafter CMS - Case European Bank
PDF
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
PPT
Hippo CMS - A first look
PDF
Spring Ldap
PDF
Spring In Alfresco Ecm
Embracing InnerSource for your adaptive Digital Transformation
Introducing the ASF at Microsoft Build 2020 - Italian Dev Community
Smart Alfresco ECM Program Strategy for Your New Success Story
Design your own BPM Program Strategy with Alfresco Process Services
Alfresco Process Services Live Demo @ Red Hat Open Source Day 2017 Italy
The Journey of Apache ManifoldCF: Learning from ASF's Successes
Implementing portlets using Web Scripts
Alfresco Day Roma 2015 - Sourcesense
Alfresco Summit 2014 - Crafter CMS - Case European Bank
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
Hippo CMS - A first look
Spring Ldap
Spring In Alfresco Ecm

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
A Presentation on Artificial Intelligence
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
KodekX | Application Modernization Development
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Network Security Unit 5.pdf for BCA BBA.
Unlocking AI with Model Context Protocol (MCP)
The Rise and Fall of 3GPP – Time for a Sabbatical?
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Weekly Chronicles - August'25 Week I
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
A Presentation on Artificial Intelligence
“AI and Expert System Decision Support & Business Intelligence Systems”
Understanding_Digital_Forensics_Presentation.pptx
Chapter 3 Spatial Domain Image Processing.pdf
KodekX | Application Modernization Development
Building Integrated photovoltaic BIPV_UPV.pdf
cuic standard and advanced reporting.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation
Reach Out and Touch Someone: Haptics and Empathic Computing
The AUB Centre for AI in Media Proposal.docx
Big Data Technologies - Introduction.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...

Apache ManifoldCF

  • 2. Overview ● The story ● What is ManifoldCF? ● Why ManifoldCF? ● Architecture ● The 0.3-incubating version ● The 0.4-incubating version ● What's new in the 0.5-incubating ● The book: ManifoldCF in Action ● Demo ● Resources
  • 3. The story The original ManifoldCF code base was granted by MetaCarta Inc., to the Apache Software Foundation in December 2009. The MetaCarta effort represented more than five years of successful development and testing in multiple, challenging enterprise environments. The project is in the Apache Incubator because the community was not yet diverse enough, but now the project is towards graduation. ^__^
  • 4. What is ManifoldCF? ● Open Source crawler ○ schedule jobs to create indexes ■ get contents from repositories ■ push contents on search servers
  • 5. What is ManifoldCF? ● Open Source crawler ○ schedule jobs to create indexes ■ get contents from repositories ■ push contents on search servers ● Out-Of-The-Box it is distributed as J2EE web apps ○ REST API ○ Authority Service ○ Crawler UI ● Can be embedded in any Java application
  • 6. Why ManifoldCF? ● Reliability ● Incremental ● Multi repositories ● Security model ● Monitoring
  • 7. Why ManifoldCF? - Reliability Jobs scheduling and configuration are stored in the database to maintain the state of all the executions
  • 8. Why ManifoldCF? - Incremental Jobs can be optionally configured to re-visit contents incrementally
  • 9. Why ManifoldCF? - Multi repositories Jobs can retrieve contents from the following repositories: ● CMIS-compliant ● Alfresco ● IBM FileNet ● EMC Documentum ● Microsoft SharePoint ● OpenText LiveLink ● Autonomy Meridio ● Memex Patriarch ● Windows Share/DFS ● Generic JDBC ● Generic Filesystem ● Generic RSS and Web
  • 10. Why ManifoldCF? - Multi repositories Jobs can ingest contents to the following search servers: ● ElasticSearch ● OpenSearchServer ● Apache Solr ● MetaCarta GTS
  • 11. Why ManifoldCF? - Security model Retrieve per-content ACLs
  • 12. Why ManifoldCF? - Monitoring UI Crawler allows you to: ● configure jobs and connectors ● monitor jobs execution ● monitor contents ingestion ○ status reports ■ document status ■ queue status ○ history reports ■ simple history ■ maximum activity ■ maximum bandwidth ■ result histogram
  • 13. Architecture ● Pull Agent Daemon ○ Jobs ■ Repository Connectors ■ Output Connectors ■ Authority Connectors
  • 14. Architecture ● Pull Agent Daemon (the core service) ○ Jobs (execute the ingestion tasks) ■ Repository Connectors (retrieve contents) ■ Output Connectors (ingest contents) ■ Authority Connectors (retrieve ACLs)
  • 16. Architecture - Job A job is an ingestion work that consists of: ○ verbal description ○ repository connection ■ authority connection (optional) ○ metadata mapping ○ output connection (search server) ○ crawling model ○ scheduling information (on demand or time ranges)
  • 18. The 0.3-incubating version ● CMIS Repository Connector ● OpenSearchServer Output Connector ● Scripting Language ● New Maven build process ● Several bug fixes
  • 19. The 0.4-incubating version ● Alfresco Connector ● JDBC Connector now supports MySQL ● CMIS Connector upgraded to OpenCMIS 0.5.0 ● Several bug fixes
  • 20. What's new in the 0.5-incubating ● Apache Velocity for connectors UI templates ● ElasticSearch Output Connector ● CMIS Connector upgraded to OpenCMIS 0.6.0 ● Prebuild connector support: just add jars and go! ● New Japanese localization ● Several bug fixes
  • 21. The book: ManifoldCF in Action ManifoldCF in Action by Karl Wright published by Manning Karl is the original developer and the principal committer of Apache ManifoldCF The book is available at the following site: http://www.manning.com/wright
  • 22. DEMO
  • 24. Thank you for your attention!