SlideShare a Scribd company logo
Solr
What is it?
•   Text search index (engine)
•   Open source
•   Not a search product
•   A tool that allows you to create a search
    solution
What is it like?
•   Google, Google Appliance.
•   FAST
•   Oracle Secure Enterprise Search
•   etc.
Google Appliance:
•   Sucks data in
•   Can’t really configure
•   Stuck with results
•   Bonnet is locked
Solr:
•   You need to feed data in
•   Highly configurable
•   Search results can be tuned
•   There is no bonnet
Why am I doing a talk?
•   Did a course
•   LucidWorks content
•   Presented by FindWise
•   FindWise are a search specialist that use a
    range of search engines
Caveats
• Course was in Solr 4.1.0, we use 3.6.1 for
  APVMA
• Course focussed on search, not ingestion or
  presentation
• Java API recommended for ingestion
• ‘Browse’ interface uses Velocity templates for
  presentation, but probably isn’t good enough
  for most projects.
Where does Solr fit?
Application Architecture
Apache Tika
•   Data import handler
•   Used to be part of Lucene
•   XML
•   PDF
•   Word
•   Excel
•   etc.
Manifold CF
•   Apache
•   Connector framework
•   Used to connect to content repositories (source)
•   Sharepoint
•   Documentum
•   CMIS
•   JDBC
•   RSS
Hydra
• FindWise
• Although Solr supports validation (e.g.
  ‘required’), don’t use it for data cleanup.
• Validation failure inconvenient: whole job fails
• Feed in clean data.
• Use Hydra for cleanup.
Apache ZooKeeper
•   Used for SolrCloud
•   Clustering and sharding
•   Solr 4.1.0 only
•   Side project for Hadoop
•   Used to manage Hadoop clusters
Inside
General Approach
• Design schema
• Prototyping
• Integration
Design Schema
• A data modelling exercise
• schema.xml
• Dynamic fields can be useful in the first pass:
  <dynamicField name=“*" type="string"
  indexed="true" />
Prototyping
• Get the data in (index)
• csv, XML, JSON
• post.jar
• URL to search and inspect raw results
• ‘browse’ interface allows developer to
  understand how the search is working
• solrconfig.xml
Integration
•   Not covered
•   Content ingestion
•   Presentation of results
•   Up to you…
Demo

More Related Content

PDF
Tips for Tuning Solr Search: No Coding Required
PPTX
Search Engines: Best Practice
PPTX
Hibernate Tips ‘n’ Tricks - 15 Tips to solve common problems
PDF
Apache Solr Search Course Drupal 7 Acquia
ODP
Introduction to Apache Solr
PPTX
Design for scale
PDF
State of Search, Solr and Facets in Drupal 8 - Drupalcamp Belgium 2015
PDF
Intro to Apache Solr
Tips for Tuning Solr Search: No Coding Required
Search Engines: Best Practice
Hibernate Tips ‘n’ Tricks - 15 Tips to solve common problems
Apache Solr Search Course Drupal 7 Acquia
Introduction to Apache Solr
Design for scale
State of Search, Solr and Facets in Drupal 8 - Drupalcamp Belgium 2015
Intro to Apache Solr

What's hot (20)

PPTX
Building Enterprise Search Engines using Open Source Technologies
PPTX
Episerver and search engines
PPTX
Elastic & Azure & Episever, Case Evira
PPTX
Survey of the Microsoft Azure Data Landscape
PPTX
Elasticsearch { "Meetup" : "talk" }
PDF
Building Search Engines - Lucene, SolR and Elasticsearch
PPTX
Enterprise Search Using Apache Solr
PDF
Scot Hacker: Building a Killer Bucketlist Site with Python/Django
PPTX
Schema less table & dynamic schema
PPTX
Search and analyze your data with elasticsearch
PPTX
Apache Solr-Webinar
PDF
Alfresco Day Stockholm 2015 - Rapid UI Development
PDF
AtlasCamp 2014: Preparing Your Plugin for JIRA Data Center
PDF
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
PPTX
Elasticsearch for Autosuggest in Clojure at Workframe
PPTX
Dev ops-presentation
PPTX
SSIS Monitoring Deep Dive
PPTX
Digital Publishing Made Easy with the OSCI Toolkit
PPTX
Tips & Tricks SQL in the City Seattle 2014
PPTX
SQL Server 2016 What's New For Developers
Building Enterprise Search Engines using Open Source Technologies
Episerver and search engines
Elastic & Azure & Episever, Case Evira
Survey of the Microsoft Azure Data Landscape
Elasticsearch { "Meetup" : "talk" }
Building Search Engines - Lucene, SolR and Elasticsearch
Enterprise Search Using Apache Solr
Scot Hacker: Building a Killer Bucketlist Site with Python/Django
Schema less table & dynamic schema
Search and analyze your data with elasticsearch
Apache Solr-Webinar
Alfresco Day Stockholm 2015 - Rapid UI Development
AtlasCamp 2014: Preparing Your Plugin for JIRA Data Center
SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...
Elasticsearch for Autosuggest in Clojure at Workframe
Dev ops-presentation
SSIS Monitoring Deep Dive
Digital Publishing Made Easy with the OSCI Toolkit
Tips & Tricks SQL in the City Seattle 2014
SQL Server 2016 What's New For Developers
Ad

Viewers also liked (10)

PPTX
Gender in Media
PDF
Kids these days (at work)
PPTX
Rhetoric in Popular Culture
PPTX
Diseños bioclimaticos
PPTX
Interpersonal Communication in Cars
PDF
Bad advises for broken heart
PPTX
boilers
Gender in Media
Kids these days (at work)
Rhetoric in Popular Culture
Diseños bioclimaticos
Interpersonal Communication in Cars
Bad advises for broken heart
boilers
Ad

Similar to Solr (20)

KEY
Intro to Apache Solr for Drupal
PDF
Search api d8
KEY
QueryPath, Mash-ups, and Web Services
PDF
Solr Recipes
PPTX
Solr + Hadoop: Interactive Search for Hadoop
PDF
Middleware in Golang: InVision's Rye
PDF
Solr Recipes Workshop
PDF
Apereo OAE - Bootcamp
PDF
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
PDF
Intro to SharePoint 2010 development for .NET developers
PDF
Search On Hadoop
PDF
Search all the things
PDF
Introduction to Solr
PPTX
Data Science at Scale: Using Apache Spark for Data Science at Bitly
PDF
SolrCloud on Hadoop
PDF
Full Text Search with Lucene
PPT
Intro to Solr in Drupal
PPTX
Wikipedia Cloud Search Webinar
PPTX
Drupal for programmers
PPTX
Zero to Sixty with Oracle ApEx
Intro to Apache Solr for Drupal
Search api d8
QueryPath, Mash-ups, and Web Services
Solr Recipes
Solr + Hadoop: Interactive Search for Hadoop
Middleware in Golang: InVision's Rye
Solr Recipes Workshop
Apereo OAE - Bootcamp
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Intro to SharePoint 2010 development for .NET developers
Search On Hadoop
Search all the things
Introduction to Solr
Data Science at Scale: Using Apache Spark for Data Science at Bitly
SolrCloud on Hadoop
Full Text Search with Lucene
Intro to Solr in Drupal
Wikipedia Cloud Search Webinar
Drupal for programmers
Zero to Sixty with Oracle ApEx

Recently uploaded (20)

PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Encapsulation theory and applications.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Electronic commerce courselecture one. Pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Empathic Computing: Creating Shared Understanding
PDF
KodekX | Application Modernization Development
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
cuic standard and advanced reporting.pdf
PPT
Teaching material agriculture food technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Cloud computing and distributed systems.
Unlocking AI with Model Context Protocol (MCP)
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation_ Review paper, used for researhc scholars
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Encapsulation theory and applications.pdf
A Presentation on Artificial Intelligence
Electronic commerce courselecture one. Pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Review of recent advances in non-invasive hemoglobin estimation
NewMind AI Weekly Chronicles - August'25 Week I
Empathic Computing: Creating Shared Understanding
KodekX | Application Modernization Development
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
cuic standard and advanced reporting.pdf
Teaching material agriculture food technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Cloud computing and distributed systems.

Solr

  • 2. What is it? • Text search index (engine) • Open source • Not a search product • A tool that allows you to create a search solution
  • 3. What is it like? • Google, Google Appliance. • FAST • Oracle Secure Enterprise Search • etc.
  • 4. Google Appliance: • Sucks data in • Can’t really configure • Stuck with results • Bonnet is locked
  • 5. Solr: • You need to feed data in • Highly configurable • Search results can be tuned • There is no bonnet
  • 6. Why am I doing a talk? • Did a course • LucidWorks content • Presented by FindWise • FindWise are a search specialist that use a range of search engines
  • 7. Caveats • Course was in Solr 4.1.0, we use 3.6.1 for APVMA • Course focussed on search, not ingestion or presentation • Java API recommended for ingestion • ‘Browse’ interface uses Velocity templates for presentation, but probably isn’t good enough for most projects.
  • 10. Apache Tika • Data import handler • Used to be part of Lucene • XML • PDF • Word • Excel • etc.
  • 11. Manifold CF • Apache • Connector framework • Used to connect to content repositories (source) • Sharepoint • Documentum • CMIS • JDBC • RSS
  • 12. Hydra • FindWise • Although Solr supports validation (e.g. ‘required’), don’t use it for data cleanup. • Validation failure inconvenient: whole job fails • Feed in clean data. • Use Hydra for cleanup.
  • 13. Apache ZooKeeper • Used for SolrCloud • Clustering and sharding • Solr 4.1.0 only • Side project for Hadoop • Used to manage Hadoop clusters
  • 15. General Approach • Design schema • Prototyping • Integration
  • 16. Design Schema • A data modelling exercise • schema.xml • Dynamic fields can be useful in the first pass: <dynamicField name=“*" type="string" indexed="true" />
  • 17. Prototyping • Get the data in (index) • csv, XML, JSON • post.jar • URL to search and inspect raw results • ‘browse’ interface allows developer to understand how the search is working • solrconfig.xml
  • 18. Integration • Not covered • Content ingestion • Presentation of results • Up to you…
  • 19. Demo