SlideShare a Scribd company logo
IMPLEMENTING SITE SEARCH IN CQ5/AEM
DEVANG SHAH, I-CUBED


Session Outline
 Importance of site search functionality
 CQ5 internal search workings & limitations
 Integrating CQ5 with external search engines &

challenges
 Indexing patterns for integrating with external
search engines
 Q&A


Site Search is one of the core
functionality of most websites



Browse v/s Search: Alternate methods
of allowing visitors to find the
information they need quickly and
easily



“90 percent of companies report
that search is the No.1 means of
navigation on their site”
-- Forrester Research

“82 percent of visitors use site
search to find the information they
need”
-- Juniper Research

Advances in search features, which
allows site visitors to:





Auto complete/auto correct search terms
Build advanced queries,
Filter results by facets,
Search results refined by location,
preferences, previous history, etc

“Visitors who used site search
were “more likely to convert from
browsers to buyers”.”
-- Juniper Research
•

Jackrabbit internally uses Lucene to
Index repository content

•

Whenever any content is modified, along
with it getting stored in repository,
lucene index is also updated

•

Index Location:
<crx-quickstart>/repository:
• repository/index
• workspaces/crx.default/index

•

Index Configuration:
• Repository.xml & workspaces.xml
<SearchIndex> block
• tika-config.xml in workspaces
folder

•

Changes in new version of Jackrabbit (3.x
/ Oak)
•

Jackrabbit
• JCR Spec 1.0: Support for XPATH &
JCR SQL1
• JCR Spec 2.0: Support for JCR
SQL2. Support for XPATH
deprecated in JCR 2.0 but
Jackrabbit still supports it
• Both SQL & XPATH queries are
translated to same search tree

•

Query Builder is an API to build queries
for a query engine

•

CQ providers several OOTB components
& extensions which leverages
QueryBuilder API for full text or predicate
based searches

•

OOTB Search Component provides
support for full text query and enhanced
search features: similar pages, facets
support, pagination, etc


Use Case: Non CQ Content Sources





Use Case: Author v/s Visitor Search Patterns





CQ generates one index per server
Author and visitor search patterns and requirements are typically different

Performance & Architecture Considerations






Larger sites with more than one source of content and assets.
Difficult to index non-CQ content

‘n’ number of queries and search variations – making it difficult to utilize CQ caching
architecture
Jackrabbit layer on top of Lucene may slow down search and query performance
Scaling of search architecture dependent upon CQ architecture

Customizations




Utilizing different content parsers, index tuning, etc (mitigated in 5.6.1)
Can I use newer version of Lucene?
How can I extend Jackrabbit search implementation?


External Search Platforms
 Search Providers with Crawlers (examples):
▪ Google Search Appliance
▪ Microsoft FAST

 Non-crawler Search Providers (examples):
▪ Endeca
▪ Lucene/Solr



Enables independent scaling of search platform



Supports more than one content sources



Configuration & customization of search application is decoupled from
CQ5 application



May provide more advanced search features (faceted search, geospatial
search, personalization, etc)


Challenges building & managing search indexes
 Building Site Index: Crawl or Query & Inject?
 How often should index be rebuilt?
 How to ensure that content & metadata between content

sources and search index is always in sync?
 In case of multiple data sources, how to manage
duplicates, index structure and common metadata model?


Challenges querying & building search results
 Should search results page be hosted on the provider’s

platform or within CQ?
 Does search provider offer extended API to query and
build search results within the application?
Implementing Site Search in CQ5 / AEM


Integration Notes:
 GSA, FAST Site Crawler, Endeca’s Plugin for CRX Indexing, Solr via open

Source crawlers (Nutch, etc)
 May require custom service which returns data (for example for Solr, Endeca)


Pros:
 Ease of implementation
 Indexes rendered version of the pages



Cons:
 Lag between content publishing and index update process may result in out

of sync search results experience. Also, what happens to deleted content?
 Larger index crawl and build times
 Search index doesn’t have complete set of meta-data
Implementing Site Search in CQ5 / AEM


Example – CQ / FAST connector (available via service pack)



Pros:
⁻ Search index always in sync with content repository
⁻ Ability to send metadata with content
⁻ Customizable data formats and allows for partial indexing of

page



Cons:
⁻ Will require custom development efforts
⁻ Indexing of content instead of rendered version of the pages
⁻ System Performance / Event Handling
Implementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEM


Pros:
⁻ Search index (mostly) in sync with content repository
⁻ Ability to send metadata with content
⁻ Customizable data formats and allows for partial indexing

of page
⁻ Minimal replication event processing


Cons:
⁻ Will require custom development efforts
⁻ Search index may get out of sync with content repository

(but for a shorter duration only)
⁻ Indexing of content instead of rendered version of the
pages


Handling initial content load & index creation
 In case of content push approach, how will initial index be generated? May

need to create initial baseline via site crawl or custom service
 In case of content pull approach, how will index reflect deleted, moved, site
pages?


Permission sensitive site pages & assets
 Option 1: Export ACLs to Search Provider (example: CQ/FAST Connector)
 Option 2: Check user permission via CQ at run time (similar to how CQ handles

delivery of content incase of closed user groups)



Referenced assets, content pages and promos
 Option: Query referenced pages and index. May cause performance (&

recursive index) issue though.
 Option: Selective content indexing (Index parts of page instead of entire page)
Implementing Site Search in CQ5 / AEM

More Related Content

PPTX
Effective Searching by Dominik Kornas
PPTX
Do you need an external search platform for Adobe Experience Manager?
PPTX
Consuming External Content and Enriching Content with Apache Camel
PPTX
SharePoint Framework, React, and Office UI Fabric spc adriatics 2016
PDF
How to migrate from any CMS (thru the front-door)
PDF
EVOLVE'13 | Enhance | External Search | Matthias Wermund
PDF
Elastic search adaptto2014
PPTX
Test driving Azure Search and DocumentDB
Effective Searching by Dominik Kornas
Do you need an external search platform for Adobe Experience Manager?
Consuming External Content and Enriching Content with Apache Camel
SharePoint Framework, React, and Office UI Fabric spc adriatics 2016
How to migrate from any CMS (thru the front-door)
EVOLVE'13 | Enhance | External Search | Matthias Wermund
Elastic search adaptto2014
Test driving Azure Search and DocumentDB

What's hot (20)

PDF
Azure search
PPTX
Gab2015 azure search as a service
PDF
HAL APIs and Ember Data
PPTX
Building Ext JS Using HATEOAS - Jeff Stano
DOCX
SharePoint 2013 REST API & Remote Authentication
PPTX
Deep-Dive to Azure Search
PPTX
Kql and the content search web part
PPTX
Adding azuresearch
PPTX
Introduction to the SharePoint 2013 REST API
PPTX
Cloud Security Monitoring and Spark Analytics
PDF
(ATS6-PLAT04) Query service
PDF
Taking Advantage of the SharePoint 2013 REST API
PPTX
Azure search
PPTX
40+ tips to use Postman more efficiently
PDF
5 Reasons Your Site Needs Acquia Search
PPT
Search domain basics
PDF
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
PPTX
How_To_Soup_Up_Your_Farm
PPTX
Melbourne User Group OAK and MongoDB
PDF
Oracle APEX Nitro
Azure search
Gab2015 azure search as a service
HAL APIs and Ember Data
Building Ext JS Using HATEOAS - Jeff Stano
SharePoint 2013 REST API & Remote Authentication
Deep-Dive to Azure Search
Kql and the content search web part
Adding azuresearch
Introduction to the SharePoint 2013 REST API
Cloud Security Monitoring and Spark Analytics
(ATS6-PLAT04) Query service
Taking Advantage of the SharePoint 2013 REST API
Azure search
40+ tips to use Postman more efficiently
5 Reasons Your Site Needs Acquia Search
Search domain basics
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
How_To_Soup_Up_Your_Farm
Melbourne User Group OAK and MongoDB
Oracle APEX Nitro
Ad

Viewers also liked (20)

PDF
Apache SOLR in AEM 6
PDF
Basics of Solr and Solr Integration with AEM6
PPTX
Omnisearch in AEM 6.2 - Search All the Things
PDF
CQ5 QueryBuilder - .adaptTo(Berlin) 2011
PDF
EVOLVE'15 | Enhance| Christian Meyer & Andreea Sandru | AEM User interfacecus...
PDF
Khushal Patil New
PDF
Oak / Solr integration
PPTX
Neha Gupta - AIR Mobile: Cross promotion
PDF
Adobe Experience Manager (AEM) - Multilingual SIG on SEO - Dave Lloyd
PDF
AEM 6.2 -Assets - Creating engaging experience at scale
PPTX
Quiery builder
PDF
JCR, Sling or AEM? Which API should I use and when?
PDF
AEM 6 DAM - Integrations, Integrations, Integrations
PDF
Evolve13 cq-commerce-framework
PPTX
adaptTo() 2014 - Integrating Open Source Search with CQ/AEM
PPTX
Creating Real-Time Data Mashups with Node.JS and Adobe CQ
PPTX
Adobe Marketing Cloud Integration with Adobe AEM
PPTX
Sap java connector / Hybris RFC
PPTX
AEM & eCommerce integration
PDF
Adobe AEM Commerce with hybris
Apache SOLR in AEM 6
Basics of Solr and Solr Integration with AEM6
Omnisearch in AEM 6.2 - Search All the Things
CQ5 QueryBuilder - .adaptTo(Berlin) 2011
EVOLVE'15 | Enhance| Christian Meyer & Andreea Sandru | AEM User interfacecus...
Khushal Patil New
Oak / Solr integration
Neha Gupta - AIR Mobile: Cross promotion
Adobe Experience Manager (AEM) - Multilingual SIG on SEO - Dave Lloyd
AEM 6.2 -Assets - Creating engaging experience at scale
Quiery builder
JCR, Sling or AEM? Which API should I use and when?
AEM 6 DAM - Integrations, Integrations, Integrations
Evolve13 cq-commerce-framework
adaptTo() 2014 - Integrating Open Source Search with CQ/AEM
Creating Real-Time Data Mashups with Node.JS and Adobe CQ
Adobe Marketing Cloud Integration with Adobe AEM
Sap java connector / Hybris RFC
AEM & eCommerce integration
Adobe AEM Commerce with hybris
Ad

Similar to Implementing Site Search in CQ5 / AEM (20)

PDF
SharePoint User Group Meeting- SharePoint 2013 Search
PPTX
Web Scale Discovery Vs Federated Search
PDF
Fried dallas spug
PDF
Evaluation of web scale discovery services
PDF
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
PPTX
Overview of Search in SharePoint Server 2013 - Australian SharePoint Conferen...
PPTX
Lec 11-12 Search engines for easy use.pptx
PDF
CREATE SEARCH DRIVEN BUSINESS INTELLIGENCE APPLICATION USING FAST SEARCH FO...
PPTX
#SPSPhilly search topology & optimization
PPT
Search engine
PDF
A machine learning approach to web page filtering using ...
PDF
A machine learning approach to web page filtering using ...
PPTX
ESPC13 - 10 Things I Like in SharePoint 2013 Search
PDF
Fried houston spug
PPTX
Google search vs Solr search for Enterprise search
PPTX
SharePoint 2013 Search - Whats new for End Users
PPTX
SharePoint Saturday Perth 2013 - Overview of Search in SharePoint Server 201...
PPTX
Share point 2013 enterprise search (public)
PPTX
TechFuse 2013 - Break down the walls SharePoint 2013
PPTX
CRAWLER,INDEX,RANKING AND ITS WORKING.pptx
SharePoint User Group Meeting- SharePoint 2013 Search
Web Scale Discovery Vs Federated Search
Fried dallas spug
Evaluation of web scale discovery services
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
Overview of Search in SharePoint Server 2013 - Australian SharePoint Conferen...
Lec 11-12 Search engines for easy use.pptx
CREATE SEARCH DRIVEN BUSINESS INTELLIGENCE APPLICATION USING FAST SEARCH FO...
#SPSPhilly search topology & optimization
Search engine
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
ESPC13 - 10 Things I Like in SharePoint 2013 Search
Fried houston spug
Google search vs Solr search for Enterprise search
SharePoint 2013 Search - Whats new for End Users
SharePoint Saturday Perth 2013 - Overview of Search in SharePoint Server 201...
Share point 2013 enterprise search (public)
TechFuse 2013 - Break down the walls SharePoint 2013
CRAWLER,INDEX,RANKING AND ITS WORKING.pptx

Recently uploaded (20)

PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPT
Teaching material agriculture food technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
sap open course for s4hana steps from ECC to s4
Review of recent advances in non-invasive hemoglobin estimation
Programs and apps: productivity, graphics, security and other tools
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Unlocking AI with Model Context Protocol (MCP)
Per capita expenditure prediction using model stacking based on satellite ima...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Teaching material agriculture food technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Machine learning based COVID-19 study performance prediction
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Empathic Computing: Creating Shared Understanding
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
“AI and Expert System Decision Support & Business Intelligence Systems”
Network Security Unit 5.pdf for BCA BBA.
NewMind AI Weekly Chronicles - August'25 Week I
Encapsulation_ Review paper, used for researhc scholars
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
sap open course for s4hana steps from ECC to s4

Implementing Site Search in CQ5 / AEM

  • 1. IMPLEMENTING SITE SEARCH IN CQ5/AEM DEVANG SHAH, I-CUBED
  • 2.  Session Outline  Importance of site search functionality  CQ5 internal search workings & limitations  Integrating CQ5 with external search engines & challenges  Indexing patterns for integrating with external search engines  Q&A
  • 3.  Site Search is one of the core functionality of most websites  Browse v/s Search: Alternate methods of allowing visitors to find the information they need quickly and easily  “90 percent of companies report that search is the No.1 means of navigation on their site” -- Forrester Research “82 percent of visitors use site search to find the information they need” -- Juniper Research Advances in search features, which allows site visitors to:     Auto complete/auto correct search terms Build advanced queries, Filter results by facets, Search results refined by location, preferences, previous history, etc “Visitors who used site search were “more likely to convert from browsers to buyers”.” -- Juniper Research
  • 4. • Jackrabbit internally uses Lucene to Index repository content • Whenever any content is modified, along with it getting stored in repository, lucene index is also updated • Index Location: <crx-quickstart>/repository: • repository/index • workspaces/crx.default/index • Index Configuration: • Repository.xml & workspaces.xml <SearchIndex> block • tika-config.xml in workspaces folder • Changes in new version of Jackrabbit (3.x / Oak)
  • 5. • Jackrabbit • JCR Spec 1.0: Support for XPATH & JCR SQL1 • JCR Spec 2.0: Support for JCR SQL2. Support for XPATH deprecated in JCR 2.0 but Jackrabbit still supports it • Both SQL & XPATH queries are translated to same search tree • Query Builder is an API to build queries for a query engine • CQ providers several OOTB components & extensions which leverages QueryBuilder API for full text or predicate based searches • OOTB Search Component provides support for full text query and enhanced search features: similar pages, facets support, pagination, etc
  • 6.  Use Case: Non CQ Content Sources    Use Case: Author v/s Visitor Search Patterns    CQ generates one index per server Author and visitor search patterns and requirements are typically different Performance & Architecture Considerations     Larger sites with more than one source of content and assets. Difficult to index non-CQ content ‘n’ number of queries and search variations – making it difficult to utilize CQ caching architecture Jackrabbit layer on top of Lucene may slow down search and query performance Scaling of search architecture dependent upon CQ architecture Customizations    Utilizing different content parsers, index tuning, etc (mitigated in 5.6.1) Can I use newer version of Lucene? How can I extend Jackrabbit search implementation?
  • 7.  External Search Platforms  Search Providers with Crawlers (examples): ▪ Google Search Appliance ▪ Microsoft FAST  Non-crawler Search Providers (examples): ▪ Endeca ▪ Lucene/Solr  Enables independent scaling of search platform  Supports more than one content sources  Configuration & customization of search application is decoupled from CQ5 application  May provide more advanced search features (faceted search, geospatial search, personalization, etc)
  • 8.  Challenges building & managing search indexes  Building Site Index: Crawl or Query & Inject?  How often should index be rebuilt?  How to ensure that content & metadata between content sources and search index is always in sync?  In case of multiple data sources, how to manage duplicates, index structure and common metadata model?  Challenges querying & building search results  Should search results page be hosted on the provider’s platform or within CQ?  Does search provider offer extended API to query and build search results within the application?
  • 10.  Integration Notes:  GSA, FAST Site Crawler, Endeca’s Plugin for CRX Indexing, Solr via open Source crawlers (Nutch, etc)  May require custom service which returns data (for example for Solr, Endeca)  Pros:  Ease of implementation  Indexes rendered version of the pages  Cons:  Lag between content publishing and index update process may result in out of sync search results experience. Also, what happens to deleted content?  Larger index crawl and build times  Search index doesn’t have complete set of meta-data
  • 12.  Example – CQ / FAST connector (available via service pack)  Pros: ⁻ Search index always in sync with content repository ⁻ Ability to send metadata with content ⁻ Customizable data formats and allows for partial indexing of page  Cons: ⁻ Will require custom development efforts ⁻ Indexing of content instead of rendered version of the pages ⁻ System Performance / Event Handling
  • 15.  Pros: ⁻ Search index (mostly) in sync with content repository ⁻ Ability to send metadata with content ⁻ Customizable data formats and allows for partial indexing of page ⁻ Minimal replication event processing  Cons: ⁻ Will require custom development efforts ⁻ Search index may get out of sync with content repository (but for a shorter duration only) ⁻ Indexing of content instead of rendered version of the pages
  • 16.  Handling initial content load & index creation  In case of content push approach, how will initial index be generated? May need to create initial baseline via site crawl or custom service  In case of content pull approach, how will index reflect deleted, moved, site pages?  Permission sensitive site pages & assets  Option 1: Export ACLs to Search Provider (example: CQ/FAST Connector)  Option 2: Check user permission via CQ at run time (similar to how CQ handles delivery of content incase of closed user groups)  Referenced assets, content pages and promos  Option: Query referenced pages and index. May cause performance (& recursive index) issue though.  Option: Selective content indexing (Index parts of page instead of entire page)