SlideShare a Scribd company logo
Solr
   Search at the Speed of Light


          JavaZone 2009
           September 10
               Oslo
  Erik Hatcher, Lucid Imagination
erik.hatcher@lucidimagination.com




                                    1
Solr History

     • Created by Yonik Seeley for CNET
     • Contributed to Apache in January 2006
     • December 2006:Version 1.1 released
     • June 2007:Version 1.2 released
     • September 2008:Version 1.3 released
     • ~September 2009:Version 1.4
http://lucene.apache.org/solr
    © 2008-2009          Lucid Imagination, Inc.
                                                   2
Solr: Big Picture
                                   Data


                                                       DB


              Document
               Document
                 Documents




                                Solr




                               Search Results




© 2008-2009                  Lucid Imagination, Inc.
                                                            3
Features

 • Lucene power exposed over HTTP
 • Scalability: caching, replication, distributed
      search
 • Faceting
 • And more: spell checking, highlighting,
      clustering, rich document and DB indexing,
      "more like this"


© 2008-2009            Lucid Imagination, Inc.
                                                    4
Lucene

 • Fast, scalable search library
 • Lucene index structure
  • Index contains documents
    • documents have fields
      • indexed fields have terms

© 2008-2009        Lucid Imagination, Inc.
                                             5
Inverted Index

 • Commonly used search
      engine data structure
 • Efficient lookup of terms
      across large number of
      documents
 • Usually stores positional
      information to enable From "Taming Text" by Grant Ingersoll and Tom Morton
      phrase/proximity queries


© 2008-2009                     Lucid Imagination, Inc.
                                                                                   6
Analysis Process




© 2008-2009         Lucid Imagination, Inc.
                                              7
Analyzing the analyzer
                    Example phrase

      The quick brown fox jumps over the lazy dog.




© 2008-2009            Lucid Imagination, Inc.
                                                     8
WhitespaceAnalyzer
                Simplest built-in analyzer
      The quick brown fox jumps over the lazy dog.




  [The] [quick] [brown] [fox] [jumps] [over] [the]
                    [lazy] [dog.]

© 2008-2009             Lucid Imagination, Inc.
                                                     9
SimpleAnalyzer
          Lowercases, splits at non-letter boundaries
      the quick brown fox jumps over the lazy dog.




  [the] [quick] [brown] [fox] [jumps] [over] [the]
                    [lazy] [dog]

© 2008-2009               Lucid Imagination, Inc.
                                                        10
StopAnalyzer
              Lowercases and removes stop words


      The quick brown fox jumps over the lazy dog.




 [quick] [brown] [fox] [jumps] [over] [lazy] [dog]




© 2008-2009               Lucid Imagination, Inc.
                                                     11
SnowballAnalyzer
                   Stemming algorithm
      The quick brown fox jumps over the lazi dog.




   [the] [quick] [brown] [fox] [jump] [over] [the]
                     [lazi] [dog]

© 2008-2009            Lucid Imagination, Inc.
                                                     12
What's in a token?




© 2008-2009          Lucid Imagination, Inc.
                                               13
Relevance

 •    Term frequency (TF): number of times a term
      appears in a document

 •    Inverse document frequency (IDF): One over
      number of times term appears in the index (1/df)

 •    Field length normalization: control affect field
      length, in number of terms, has on score

 •    Boost factors: terms, fields, or documents



© 2008-2009               Lucid Imagination, Inc.
                                                         14
Lucene Scoring
                                  d1




                                                q1
                  Θ




© 2008-2009           Lucid Imagination, Inc.
                                                     15
Solr APIs

 • HTTP GET/POST (curl or any other HTTP
      client)
 • JSON
 • SolrJ (embedded or HTTP)
 • solr-ruby
 • python, PHP, solrsharp, XSLT

© 2008-2009         Lucid Imagination, Inc.
                                              16
Solr in Production
                                              Incoming Search
                                                  Requests




                                               Load Balancer




                                                  Solr
                                                 Solr Master
                                                  Solr Master


                              Shard Request                    Shard Request


                   Load Balancer                                          Load Balancer



                      Shard                                                    Shard
          Shard                                                  Shard
          Master                                 1..n            Master
                          Replicant             shards                            Replicant
                           Replicant                                               Replicant
                            Replicant                                               Replicant
                              Replicant                                               Replicant




© 2008-2009                                    Lucid Imagination, Inc.
                                                                                                  17
Getting Started:
                 It's This Easy
1.Start Solr

  java -jar start.jar
2.Index your data

  java -jar post.jar *.xml
3.Search

  http://localhost:8983/solr
  © 2008-2009         Lucid Imagination, Inc.
                                                18
Configuration
 •    schema.xml

     •    field types and fields

 •    solrconfig.xml

     •    request handler mappings

     •    cache settings: filter, query, document

     •    warming listeners

     •    HTTP cache settings

     •    Lucene index parameters

     •    plugins: spell checking, highlighting


© 2008-2009                      Lucid Imagination, Inc.
                                                           19
Solr add/update XML
<add><doc>
  <field name="id">MA147LL/A</field>
  <field name="name">Apple 60 GB iPod with Video Playback Black</field>
  <field name="manu">Apple Computer Inc.</field>
  <field name="cat">electronics</field>
  <field name="cat">music</field>
  <field name="features">iTunes, Podcasts, Audiobooks</field>
  <field name="features">Stores up to 15,000 songs, 25,000 photos, or 150 hours of
               video</field>
  <field name="features">2.5-inch, 320x240 color TFT LCD display
                         with LED backlight</field>
  <field name="features">Up to 20 hours of battery life</field>
  <field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless,
                         H.264 video</field>
  <field name="features">Notes, Calendar, Phone book, Hold button, Date display,
      Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware,
      USB 2.0 compatibility, Playback speed control, Rechargeable capability,
      Battery level indication</field>
  <field name="includes">earbud headphones, USB cable</field>
  <field name="weight">5.5</field>
  <field name="price">399.00</field>
  <field name="popularity">10</field>
  <field name="inStock">true</field>
</doc></add>


     © 2008-2009                     Lucid Imagination, Inc.
                                                                                     20
Indexing Solr XML
 • Via curl:'http://localhost:8983/
   curl
      solr/update?commit=true' --
      data-binary @ipod_video.xml -
      H 'Content-type:text/xml;
      charset=utf-8'

 • Via Solr's Java-based post tool:
      java -jar post.jar ipod_video.xml



© 2008-2009            Lucid Imagination, Inc.
                                                 21
Indexing CSV


curl 'http://localhost:8983/solr/update/
csv?commit=true' --data-binary @books.csv -
H 'Content-type:text/plain; charset=utf-8'




   © 2008-2009       Lucid Imagination, Inc.
                                               22
Content Streams

 •    Allows Solr server to fetch local or remote data
      itself. Must enable remote streaming in
      solrconfig.xml

 •    http://localhost:8983/solr/update?stream.file=<local
      Solr path to exampledocs>/ipod_video.xml

 •    &stream.url=<url to content>

 •    Security warning: allows Solr to fetch arbitrary
      server-side file or network URL content



© 2008-2009                Lucid Imagination, Inc.
                                                            23
Indexing Rich Documents


curl 'http://localhost:8983/solr/update/
extract?
literal.id=doc1&commit=true&extractOnly=true
&wt=ruby&indent=on' -F
"myfile=@tutorial.html"




    © 2008-2009     Lucid Imagination, Inc.
                                               24
Indexing with SolrJ

SolrServer solr =
    new CommonsHttpSolrServer(new URL("http://localhost:8983/solr"));

SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "JAVAZONE_09");
doc.addField("title", "JavaZone 2009 SolrJ Example");
solr.add(doc);
solr.commit();     // after a batch, not per document
solr.optimize();   // periodically, when needed




    © 2008-2009                Lucid Imagination, Inc.
                                                                        25
Indexing with Ruby

solr = Connection.new(
  'http://localhost:8983/solr',
  :autocommit => :on)

solr.add(:id => 123,
         :title => 'Solr in Action')

solr.optimize       # periodically, as needed




  © 2008-2009           Lucid Imagination, Inc.
                                                  26
Data Import Handler


• Indexes relational database, XML data sources,
   e-mail, and more
• Supports full and incremental/delta indexing
• Extensible with custom data sources,
   transformers, etc
• http://wiki.apache.org/solr/DataImportHandler
 © 2008-2009           Lucid Imagination, Inc.
                                                   27
DB Indexing



http://localhost:8983/solr/db/dataimport?
command=full-import




  © 2008-2009       Lucid Imagination, Inc.
                                              28
Example Search Request

 • http://localhost:8983/solr/select?q=query
  • &start=50
  • &rows=25
  • &fq=filter+query
  • &facet=on&facet.field=category

© 2008-2009         Lucid Imagination, Inc.
                                               29
Debug Query


 • &debugQuery=true is your friend
 • Includes parsed query, explanations, and
      search component timings in response




© 2008-2009           Lucid Imagination, Inc.
                                                30
Query Parser

 • Controlled by defType parameter
  • &defType=lucene (actually a Solr
          extension of Lucene’s QueryParser)
     • &defType=dismax
 • Local {!..} override syntax

© 2008-2009             Lucid Imagination, Inc.
                                                  31
Solr Query Parser

 • http://lucene.apache.org/java/2_4_0/
      queryparsersyntax.html + Solr extensions
 • Kitchen sink parser, includes advanced user-
      unfriendly syntax
 • Syntax errors throw parse exceptions back
      to client
 • Example: title:ipod* AND price:[0 TO 100]
© 2008-2009               Lucid Imagination, Inc.
                                                    32
Dismax Query Parser

 • Simplified syntax:
      loose text “quote phrases” -prohibited
      +required
 • Spreads query terms across query fields
      (qf) with dynamic boosting per field, implicit
      phrase construction (pf), boosting function
      (bf), boosting query (bq), and minimum
      match (mm)


© 2008-2009            Lucid Imagination, Inc.
                                                      33
Searching with SolrJ


SolrServer server = new CommonsHttpSolrServer("http://
  localhost:8983/solr");
SolrQuery params = new SolrQuery("author:John");
params.setFields("*,score");
params.setRows(3);
QueryResponse response = server.query(params);
for (SolrDocument document : response.getResults()) {
      System.out.println("Doc: " + document);
}




   © 2008-2009            Lucid Imagination, Inc.
                                                         34
Searching with Ruby


conn = Connection.new(
    'http://localhost:8983/solr')

conn.query('my query') do |hit|
  puts hit.inspect
end




© 2008-2009           Lucid Imagination, Inc.
                                                35
delete, update, etc
 •    Delete:
     • <delete><id>05991</id></delete>
     •    <delete>
             <query>category:Unused</query>
          </delete>

     •    java -Ddata=args -jar post.jar
          "<delete><query>*:*</query></delete>"

 •    Update: simply <add> doc with same unique key

 •    Commit: <commit/>

 •    Optimize: <optimize/>
© 2008-2009              Lucid Imagination, Inc.
                                                      36
Faceting


• Counts per subset within results
• Facet on: field terms, queries, date
    ranges
• &facet=on
    &facet.field=cat
    &facet.query=price:[0 TO 100]
• http://wiki.apache.org/solr/
    SimpleFacetParameters
© 2008-2009          Lucid Imagination, Inc.
                                               37
Spell checking


•    Not enabled by default, see example config to wire it in

•    http://localhost:8983/solr/spell?
     q=epod&spellcheck=on&spellcheck.build=true

•    File or index-based dictionaries

•    Supports pluggable distance algorithms: Levenstein and
     JaroWinkler

•    http://wiki.apache.org/solr/SpellCheckComponent


© 2008-2009                Lucid Imagination, Inc.
                                                               38
Highlighting


 • http://localhost:8983/solr/select?
      q=ipod&hl=on&hl.fl=manu,name
 • http://wiki.apache.org/solr/
      HighlightingParameters




© 2008-2009           Lucid Imagination, Inc.
                                                39
More Like This


 • http://localhost:8983/solr/select?
      q=ipod&mlt=true&mlt.fl=manu,cat&mlt.min
      df=1&mlt.mintf=1&fl=id,score,name
 • http://wiki.apache.org/solr/MoreLikeThis


© 2008-2009          Lucid Imagination, Inc.
                                               40
Scaling: Query Throughput

 • Replication
  • slaves poll master for index updates
  • transfers index files from master to slave
  • configuration files can also be transferred
  • entirely Java/HTTP-based in Solr 1.4
          (prior versions used rsync)



© 2008-2009              Lucid Imagination, Inc.
                                                   41
Scaling: Collection Size

 • Distribution
  • Index documents across shards
  • query single server with shards
          parameter
         • sends requests to each shard
         • aggregates result to a single response

© 2008-2009             Lucid Imagination, Inc.
                                                    42
Solr-powered UI

 • Solritas (from "celeritas"):
      VelocityResponseWriter
     • easily templated output
 • SolrJS: jQuery-based widgets
  • see http://solrjs.solrstuff.org/
 • Blacklight and Flare: RoR plugins

© 2008-2009           Lucid Imagination, Inc.
                                                43
Lucene in Action, 2nd Edition




              http://www.manning.com/lucene
© 2008-2009               Lucid Imagination, Inc.
                                                    44
Search at Lucid
http://search.lucidimagination.com/?q=javazone




© 2008-2009         Lucid Imagination, Inc.
                                                 45
/")$/#$0(#
            !"#$%&'()*$+),$-+&$0&,12&#-((23#$)4&2+,$,5&-6 78)#12&
            !"#2+29:-43&2#-050,2(
            !"#$%&,2)(&$+#4"%20&,12&4)3*20,&#-442#,$-+&-6&
            !"#2+29:-43&#-(($,,230.&#-+,3$;",-30&)+%&$+64"2+#230&
            <"3&($00$-+&$0&,-&023=2&)0&!"#$%#&'#($)*$+,-#..#&-#$6-3&
            !"#2+29:-43>;)02%&02)3#1&0-4",$-+0
                 ?248&-"3&#"0,-(230&*2,&,12&(-0,&-",&-6&!"#2+29:-43&> !"#$%&'(
                 (-0,&@$%245&"02%&-82+&0-"3#2&02)3#1&0-6,@)32&&&




  A&BCCD>BCCE
   © 2008-2009                     !"#$%&'()*$+),$-+.&'+#/Inc.
                                   Lucid Imagination,            !"#$%$&'()*+',%-'./$0+'*)1)2',+$'.+,-$3,+42')5'./$'67,#/$'()5.8,+$'9)"%-,.0)%

                                                                                                                                                 46
!"#$%&'()*$+),$-+&./#0+$#)1&./)(
                          ! 2-+$3&4//1/56                                          ! <)8#&F8/11/+9,/$+6
                                     012),-1&-3&4-51&&
     Unique                          !"#2+264-51&#-(($,,21.&780&(2(921
                                                                                                 0-;3-"+%21.&0=G64H7.&<-1,:21+&!$*:,
 Combination of           ! 78)+,&'+*/89-116
                                                                                                 H7&42)1#:.&0=G.&I5J2K$21
Enterprise Search                    !"#$%&"'&(')*+,#-#'.&&%'!$/01                 ! @8$)+&G$+3/8,-+6
   and Lucene                        !"#2+264-51&#-(($,,21.&0:)$1.&780                           L2K25-@2%&M2901)N521.&,:2&N29OJ&3$1J,&
                          ! :8$3&;),#0/86                                                        #-(@12:2+J$K2&J2)1#:&2+*$+2&
    Expertise
                                     0-;$+%2&"'&(')*+,#-#'3-'4,%3&-1'5&&6                        71$+#$@)5&P1#:$,2#,&),&PF
                                     !"#2+264-51&#-(($,,21.&780&(2(921             ! 4$(-+&H-9/+,0)16
                          ! <)83&<$11/8                                                          4-5",$-+J&)1#:$,2#,.&<-1,:21+&!$*:,
                                     !"#2+264-51&#-(($,,21.&780&
                                     (2(921                                        ! I)5&;$116
                          ! 4)($&4$8/+                                                           4-5",$-+J&P1#:$,2#,.&M255J&Q)1*-
                                     <",#:6=$>)&#-(($,,21.&780&(2(921
                                                                                   ! H5)+&<#F$+1/56
                          ! =+%8>/?&@$1)1/#3$&
                                                                                                 !"#2+264-51&#-(($,,21.&&780&(2(921
                                     !"#2+26<",#:6?)%--@&#-(($,,21.&780&
                                     (2(921&
                                                                                   ! B08$9&;-9,/,,/86&C=%D$9-8E
                          ! A-"*&B",,$+*6&C=%D$9-8E
                                                                                                 !"#2+264-51&#-(($,,21.&&780&(2(921
                                     012),-1&-3&!"#2+2.&<",#:&A&?)%--@
                                                                                                 82(921&P@)#:2&4-3,N)12&Q-"+%),$-+


       B&CDDE;CDDF
           © 2008-2009                                   !"#$%&'()*$+),$-+.&'+#/
                                                         Lucid Imagination, Inc.
                                                                                                                                          47
!"#$%&'()*$+),$-+&."/$+0//&1-%02
  ;:00
<-=+2-)%
                                                                                  ()*+,-,./+"0+,/.1)
                       2+,*.3.+4"5./*,.67*.1)/
                             & 8,++"&

                        3)2"04)%%&567

     !"#0+0
                                                   89*:)%0
   >9)#?0@-:*




      2199+,:.;<""=7--1,*>" ?,;.).)@>" 21)/7<*.)@"


  !"#$$%&#$$'
         © 2008-2009                        A7:.4"B9;@.);*.1) 21)3.4+)*.;<   !"#$%$&'()*+',%-'./$0+'*)1)2',+$'.+,-$3,+42')5'./$'67,#/$'()5.8,+$'9)"%-,.0)%
                                                   Lucid Imagination, Inc.
                                                                                                                                                             48
Thank you




              http://www.lucidimagination.com
© 2008-2009                Lucid Imagination, Inc.
                                                                 49
© 2008-2009   Lucid Imagination, Inc.
                                        50

More Related Content

PPT
Lucene Introduction
PDF
Apache HBase - Just the Basics
PPTX
Apache sqoop with an use case
PPTX
Elasticsearch
PDF
Nuxt.JS Introdruction
PDF
PostgreSQL Tutorial For Beginners | Edureka
PDF
Introduction to elasticsearch
PDF
Solid NodeJS with TypeScript, Jest & NestJS
Lucene Introduction
Apache HBase - Just the Basics
Apache sqoop with an use case
Elasticsearch
Nuxt.JS Introdruction
PostgreSQL Tutorial For Beginners | Edureka
Introduction to elasticsearch
Solid NodeJS with TypeScript, Jest & NestJS

What's hot (20)

ODP
Elasticsearch for beginners
PDF
Introduction to elasticsearch
PPT
Cours compilation
PPTX
AEM and Sling
PDF
Improving the performance of Odoo deployments
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PDF
Support POO Java Deuxième Partie
PPTX
TAD (1).pptx
PPTX
Sqlmap
PDF
Odoo Performance Limits
DOCX
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
PDF
Quick flask an intro to flask
PDF
Effective CMake
PDF
Mongodb 특징 분석
PPTX
Introduction to Spring Boot
PDF
Introduction to Redux
PDF
Apache Solr crash course
PPTX
Hacking Oracle From Web Apps 1 9
PPTX
Les collections en Java
PDF
Elasticsearch
Elasticsearch for beginners
Introduction to elasticsearch
Cours compilation
AEM and Sling
Improving the performance of Odoo deployments
Presto: Optimizing Performance of SQL-on-Anything Engine
Support POO Java Deuxième Partie
TAD (1).pptx
Sqlmap
Odoo Performance Limits
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
Quick flask an intro to flask
Effective CMake
Mongodb 특징 분석
Introduction to Spring Boot
Introduction to Redux
Apache Solr crash course
Hacking Oracle From Web Apps 1 9
Les collections en Java
Elasticsearch
Ad

Viewers also liked (20)

PDF
Using Apache Solr
PPTX
Building a real time, solr-powered recommendation engine
PDF
Solr for Indexing and Searching Logs
PDF
Introduction to Apache Solr
PDF
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
PPTX
Solr introduction
PDF
New-Age Search through Apache Solr
PPTX
Enterprise Search Using Apache Solr
ODP
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
PDF
How Solr Search Works
PDF
Spark overview
PPTX
Benchmarking Solr Performance at Scale
PDF
Introduction to Solr
PDF
Apache Solr Search Course Drupal 7 Acquia
PDF
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
PPTX
Apache Solr-Webinar
PDF
High Performance Solr
PDF
Introduction to Apache Solr
PDF
Apache Spark Overview
PDF
Solr Application Development Tutorial
Using Apache Solr
Building a real time, solr-powered recommendation engine
Solr for Indexing and Searching Logs
Introduction to Apache Solr
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
Solr introduction
New-Age Search through Apache Solr
Enterprise Search Using Apache Solr
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
How Solr Search Works
Spark overview
Benchmarking Solr Performance at Scale
Introduction to Solr
Apache Solr Search Course Drupal 7 Acquia
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
Apache Solr-Webinar
High Performance Solr
Introduction to Apache Solr
Apache Spark Overview
Solr Application Development Tutorial
Ad

Similar to Solr: Search at the Speed of Light (20)

PDF
Getting started faster with LucidWorks for Solr
PDF
Rapid prototyping search applications with solr
PDF
Analyze this! tips and tricks on getting the lucene solr analyzer to index an...
PDF
Solr Powered Lucene
PDF
Introduction to Solr
PDF
The Seven Deadly Sins of Solr - By Jay Hill
PDF
The Seven Deadly Sins of Solr - By Jay Hill
PDF
The Seven Deadly Sins of Solr
PDF
Lucene for Solr Developers
PDF
Migrating Fast to Solr
PDF
Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl
PDF
Rapid prototyping with solr - By Erik Hatcher
PDF
Rapid Prototyping with Solr
PDF
Understanding Lucene Search Performance
PDF
Understanding Lucene Search Performance
PDF
Understanding Lucene Search Performance
PDF
Building a Real-time Solr-powered Recommendation Engine
PDF
An Introduction to Basics of Search and Relevancy with Apache Solr
PDF
Lucene for Solr Developers
PDF
"Solr Update" at code4lib '13 - Chicago
Getting started faster with LucidWorks for Solr
Rapid prototyping search applications with solr
Analyze this! tips and tricks on getting the lucene solr analyzer to index an...
Solr Powered Lucene
Introduction to Solr
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr
Lucene for Solr Developers
Migrating Fast to Solr
Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl
Rapid prototyping with solr - By Erik Hatcher
Rapid Prototyping with Solr
Understanding Lucene Search Performance
Understanding Lucene Search Performance
Understanding Lucene Search Performance
Building a Real-time Solr-powered Recommendation Engine
An Introduction to Basics of Search and Relevancy with Apache Solr
Lucene for Solr Developers
"Solr Update" at code4lib '13 - Chicago

More from Erik Hatcher (20)

PDF
Ted Talk
PDF
Solr Payloads
PDF
it's just search
PDF
Lucene's Latest (for Libraries)
PDF
Solr Indexing and Analysis Tricks
PDF
Solr Powered Libraries
PDF
Solr Query Parsing
PDF
Query Parsing - Tips and Tricks
PDF
Solr 4
PDF
Solr Recipes
PDF
Solr Flair
PDF
Introduction to Solr
PDF
Rapid Prototyping with Solr
PDF
Lucene for Solr Developers
PDF
What's New in Solr 3.x / 4.0
PDF
Solr Recipes Workshop
PDF
Rapid Prototyping with Solr
PDF
Lucene for Solr Developers
PDF
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
PDF
Rapid Prototyping with Solr
Ted Talk
Solr Payloads
it's just search
Lucene's Latest (for Libraries)
Solr Indexing and Analysis Tricks
Solr Powered Libraries
Solr Query Parsing
Query Parsing - Tips and Tricks
Solr 4
Solr Recipes
Solr Flair
Introduction to Solr
Rapid Prototyping with Solr
Lucene for Solr Developers
What's New in Solr 3.x / 4.0
Solr Recipes Workshop
Rapid Prototyping with Solr
Lucene for Solr Developers
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
Rapid Prototyping with Solr

Recently uploaded (20)

PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Machine learning based COVID-19 study performance prediction
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Digital-Transformation-Roadmap-for-Companies.pptx
The AUB Centre for AI in Media Proposal.docx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Unlocking AI with Model Context Protocol (MCP)
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Machine learning based COVID-19 study performance prediction
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Electronic commerce courselecture one. Pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Big Data Technologies - Introduction.pptx
Chapter 3 Spatial Domain Image Processing.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Weekly Chronicles - August'25 Week I

Solr: Search at the Speed of Light

  • 1. Solr Search at the Speed of Light JavaZone 2009 September 10 Oslo Erik Hatcher, Lucid Imagination erik.hatcher@lucidimagination.com 1
  • 2. Solr History • Created by Yonik Seeley for CNET • Contributed to Apache in January 2006 • December 2006:Version 1.1 released • June 2007:Version 1.2 released • September 2008:Version 1.3 released • ~September 2009:Version 1.4 http://lucene.apache.org/solr © 2008-2009 Lucid Imagination, Inc. 2
  • 3. Solr: Big Picture Data DB Document Document Documents Solr Search Results © 2008-2009 Lucid Imagination, Inc. 3
  • 4. Features • Lucene power exposed over HTTP • Scalability: caching, replication, distributed search • Faceting • And more: spell checking, highlighting, clustering, rich document and DB indexing, "more like this" © 2008-2009 Lucid Imagination, Inc. 4
  • 5. Lucene • Fast, scalable search library • Lucene index structure • Index contains documents • documents have fields • indexed fields have terms © 2008-2009 Lucid Imagination, Inc. 5
  • 6. Inverted Index • Commonly used search engine data structure • Efficient lookup of terms across large number of documents • Usually stores positional information to enable From "Taming Text" by Grant Ingersoll and Tom Morton phrase/proximity queries © 2008-2009 Lucid Imagination, Inc. 6
  • 7. Analysis Process © 2008-2009 Lucid Imagination, Inc. 7
  • 8. Analyzing the analyzer Example phrase The quick brown fox jumps over the lazy dog. © 2008-2009 Lucid Imagination, Inc. 8
  • 9. WhitespaceAnalyzer Simplest built-in analyzer The quick brown fox jumps over the lazy dog. [The] [quick] [brown] [fox] [jumps] [over] [the] [lazy] [dog.] © 2008-2009 Lucid Imagination, Inc. 9
  • 10. SimpleAnalyzer Lowercases, splits at non-letter boundaries the quick brown fox jumps over the lazy dog. [the] [quick] [brown] [fox] [jumps] [over] [the] [lazy] [dog] © 2008-2009 Lucid Imagination, Inc. 10
  • 11. StopAnalyzer Lowercases and removes stop words The quick brown fox jumps over the lazy dog. [quick] [brown] [fox] [jumps] [over] [lazy] [dog] © 2008-2009 Lucid Imagination, Inc. 11
  • 12. SnowballAnalyzer Stemming algorithm The quick brown fox jumps over the lazi dog. [the] [quick] [brown] [fox] [jump] [over] [the] [lazi] [dog] © 2008-2009 Lucid Imagination, Inc. 12
  • 13. What's in a token? © 2008-2009 Lucid Imagination, Inc. 13
  • 14. Relevance • Term frequency (TF): number of times a term appears in a document • Inverse document frequency (IDF): One over number of times term appears in the index (1/df) • Field length normalization: control affect field length, in number of terms, has on score • Boost factors: terms, fields, or documents © 2008-2009 Lucid Imagination, Inc. 14
  • 15. Lucene Scoring d1 q1 Θ © 2008-2009 Lucid Imagination, Inc. 15
  • 16. Solr APIs • HTTP GET/POST (curl or any other HTTP client) • JSON • SolrJ (embedded or HTTP) • solr-ruby • python, PHP, solrsharp, XSLT © 2008-2009 Lucid Imagination, Inc. 16
  • 17. Solr in Production Incoming Search Requests Load Balancer Solr Solr Master Solr Master Shard Request Shard Request Load Balancer Load Balancer Shard Shard Shard Shard Master 1..n Master Replicant shards Replicant Replicant Replicant Replicant Replicant Replicant Replicant © 2008-2009 Lucid Imagination, Inc. 17
  • 18. Getting Started: It's This Easy 1.Start Solr java -jar start.jar 2.Index your data java -jar post.jar *.xml 3.Search http://localhost:8983/solr © 2008-2009 Lucid Imagination, Inc. 18
  • 19. Configuration • schema.xml • field types and fields • solrconfig.xml • request handler mappings • cache settings: filter, query, document • warming listeners • HTTP cache settings • Lucene index parameters • plugins: spell checking, highlighting © 2008-2009 Lucid Imagination, Inc. 19
  • 20. Solr add/update XML <add><doc> <field name="id">MA147LL/A</field> <field name="name">Apple 60 GB iPod with Video Playback Black</field> <field name="manu">Apple Computer Inc.</field> <field name="cat">electronics</field> <field name="cat">music</field> <field name="features">iTunes, Podcasts, Audiobooks</field> <field name="features">Stores up to 15,000 songs, 25,000 photos, or 150 hours of video</field> <field name="features">2.5-inch, 320x240 color TFT LCD display with LED backlight</field> <field name="features">Up to 20 hours of battery life</field> <field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, H.264 video</field> <field name="features">Notes, Calendar, Phone book, Hold button, Date display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level indication</field> <field name="includes">earbud headphones, USB cable</field> <field name="weight">5.5</field> <field name="price">399.00</field> <field name="popularity">10</field> <field name="inStock">true</field> </doc></add> © 2008-2009 Lucid Imagination, Inc. 20
  • 21. Indexing Solr XML • Via curl:'http://localhost:8983/ curl solr/update?commit=true' -- data-binary @ipod_video.xml - H 'Content-type:text/xml; charset=utf-8' • Via Solr's Java-based post tool: java -jar post.jar ipod_video.xml © 2008-2009 Lucid Imagination, Inc. 21
  • 22. Indexing CSV curl 'http://localhost:8983/solr/update/ csv?commit=true' --data-binary @books.csv - H 'Content-type:text/plain; charset=utf-8' © 2008-2009 Lucid Imagination, Inc. 22
  • 23. Content Streams • Allows Solr server to fetch local or remote data itself. Must enable remote streaming in solrconfig.xml • http://localhost:8983/solr/update?stream.file=<local Solr path to exampledocs>/ipod_video.xml • &stream.url=<url to content> • Security warning: allows Solr to fetch arbitrary server-side file or network URL content © 2008-2009 Lucid Imagination, Inc. 23
  • 24. Indexing Rich Documents curl 'http://localhost:8983/solr/update/ extract? literal.id=doc1&commit=true&extractOnly=true &wt=ruby&indent=on' -F "myfile=@tutorial.html" © 2008-2009 Lucid Imagination, Inc. 24
  • 25. Indexing with SolrJ SolrServer solr = new CommonsHttpSolrServer(new URL("http://localhost:8983/solr")); SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", "JAVAZONE_09"); doc.addField("title", "JavaZone 2009 SolrJ Example"); solr.add(doc); solr.commit(); // after a batch, not per document solr.optimize(); // periodically, when needed © 2008-2009 Lucid Imagination, Inc. 25
  • 26. Indexing with Ruby solr = Connection.new( 'http://localhost:8983/solr', :autocommit => :on) solr.add(:id => 123, :title => 'Solr in Action') solr.optimize # periodically, as needed © 2008-2009 Lucid Imagination, Inc. 26
  • 27. Data Import Handler • Indexes relational database, XML data sources, e-mail, and more • Supports full and incremental/delta indexing • Extensible with custom data sources, transformers, etc • http://wiki.apache.org/solr/DataImportHandler © 2008-2009 Lucid Imagination, Inc. 27
  • 29. Example Search Request • http://localhost:8983/solr/select?q=query • &start=50 • &rows=25 • &fq=filter+query • &facet=on&facet.field=category © 2008-2009 Lucid Imagination, Inc. 29
  • 30. Debug Query • &debugQuery=true is your friend • Includes parsed query, explanations, and search component timings in response © 2008-2009 Lucid Imagination, Inc. 30
  • 31. Query Parser • Controlled by defType parameter • &defType=lucene (actually a Solr extension of Lucene’s QueryParser) • &defType=dismax • Local {!..} override syntax © 2008-2009 Lucid Imagination, Inc. 31
  • 32. Solr Query Parser • http://lucene.apache.org/java/2_4_0/ queryparsersyntax.html + Solr extensions • Kitchen sink parser, includes advanced user- unfriendly syntax • Syntax errors throw parse exceptions back to client • Example: title:ipod* AND price:[0 TO 100] © 2008-2009 Lucid Imagination, Inc. 32
  • 33. Dismax Query Parser • Simplified syntax: loose text “quote phrases” -prohibited +required • Spreads query terms across query fields (qf) with dynamic boosting per field, implicit phrase construction (pf), boosting function (bf), boosting query (bq), and minimum match (mm) © 2008-2009 Lucid Imagination, Inc. 33
  • 34. Searching with SolrJ SolrServer server = new CommonsHttpSolrServer("http:// localhost:8983/solr"); SolrQuery params = new SolrQuery("author:John"); params.setFields("*,score"); params.setRows(3); QueryResponse response = server.query(params); for (SolrDocument document : response.getResults()) { System.out.println("Doc: " + document); } © 2008-2009 Lucid Imagination, Inc. 34
  • 35. Searching with Ruby conn = Connection.new( 'http://localhost:8983/solr') conn.query('my query') do |hit| puts hit.inspect end © 2008-2009 Lucid Imagination, Inc. 35
  • 36. delete, update, etc • Delete: • <delete><id>05991</id></delete> • <delete> <query>category:Unused</query> </delete> • java -Ddata=args -jar post.jar "<delete><query>*:*</query></delete>" • Update: simply <add> doc with same unique key • Commit: <commit/> • Optimize: <optimize/> © 2008-2009 Lucid Imagination, Inc. 36
  • 37. Faceting • Counts per subset within results • Facet on: field terms, queries, date ranges • &facet=on &facet.field=cat &facet.query=price:[0 TO 100] • http://wiki.apache.org/solr/ SimpleFacetParameters © 2008-2009 Lucid Imagination, Inc. 37
  • 38. Spell checking • Not enabled by default, see example config to wire it in • http://localhost:8983/solr/spell? q=epod&spellcheck=on&spellcheck.build=true • File or index-based dictionaries • Supports pluggable distance algorithms: Levenstein and JaroWinkler • http://wiki.apache.org/solr/SpellCheckComponent © 2008-2009 Lucid Imagination, Inc. 38
  • 39. Highlighting • http://localhost:8983/solr/select? q=ipod&hl=on&hl.fl=manu,name • http://wiki.apache.org/solr/ HighlightingParameters © 2008-2009 Lucid Imagination, Inc. 39
  • 40. More Like This • http://localhost:8983/solr/select? q=ipod&mlt=true&mlt.fl=manu,cat&mlt.min df=1&mlt.mintf=1&fl=id,score,name • http://wiki.apache.org/solr/MoreLikeThis © 2008-2009 Lucid Imagination, Inc. 40
  • 41. Scaling: Query Throughput • Replication • slaves poll master for index updates • transfers index files from master to slave • configuration files can also be transferred • entirely Java/HTTP-based in Solr 1.4 (prior versions used rsync) © 2008-2009 Lucid Imagination, Inc. 41
  • 42. Scaling: Collection Size • Distribution • Index documents across shards • query single server with shards parameter • sends requests to each shard • aggregates result to a single response © 2008-2009 Lucid Imagination, Inc. 42
  • 43. Solr-powered UI • Solritas (from "celeritas"): VelocityResponseWriter • easily templated output • SolrJS: jQuery-based widgets • see http://solrjs.solrstuff.org/ • Blacklight and Flare: RoR plugins © 2008-2009 Lucid Imagination, Inc. 43
  • 44. Lucene in Action, 2nd Edition http://www.manning.com/lucene © 2008-2009 Lucid Imagination, Inc. 44
  • 46. /")$/#$0(# !"#$%&'()*$+),$-+&$0&,12&#-((23#$)4&2+,$,5&-6 78)#12& !"#2+29:-43&2#-050,2( !"#$%&,2)(&$+#4"%20&,12&4)3*20,&#-442#,$-+&-6& !"#2+29:-43&#-(($,,230.&#-+,3$;",-30&)+%&$+64"2+#230& <"3&($00$-+&$0&,-&023=2&)0&!"#$%#&'#($)*$+,-#..#&-#$6-3& !"#2+29:-43>;)02%&02)3#1&0-4",$-+0 ?248&-"3&#"0,-(230&*2,&,12&(-0,&-",&-6&!"#2+29:-43&> !"#$%&'( (-0,&@$%245&"02%&-82+&0-"3#2&02)3#1&0-6,@)32&&& A&BCCD>BCCE © 2008-2009 !"#$%&'()*$+),$-+.&'+#/Inc. Lucid Imagination, !"#$%$&'()*+',%-'./$0+'*)1)2',+$'.+,-$3,+42')5'./$'67,#/$'()5.8,+$'9)"%-,.0)% 46
  • 47. !"#$%&'()*$+),$-+&./#0+$#)1&./)( ! 2-+$3&4//1/56 ! <)8#&F8/11/+9,/$+6 012),-1&-3&4-51&& Unique !"#2+264-51&#-(($,,21.&780&(2(921 0-;3-"+%21.&0=G64H7.&<-1,:21+&!$*:, Combination of ! 78)+,&'+*/89-116 H7&42)1#:.&0=G.&I5J2K$21 Enterprise Search !"#$%&"'&(')*+,#-#'.&&%'!$/01 ! @8$)+&G$+3/8,-+6 and Lucene !"#2+264-51&#-(($,,21.&0:)$1.&780 L2K25-@2%&M2901)N521.&,:2&N29OJ&3$1J,& ! :8$3&;),#0/86 #-(@12:2+J$K2&J2)1#:&2+*$+2& Expertise 0-;$+%2&"'&(')*+,#-#'3-'4,%3&-1'5&&6 71$+#$@)5&P1#:$,2#,&),&PF !"#2+264-51&#-(($,,21.&780&(2(921 ! 4$(-+&H-9/+,0)16 ! <)83&<$11/8 4-5",$-+J&)1#:$,2#,.&<-1,:21+&!$*:, !"#2+264-51&#-(($,,21.&780& (2(921 ! I)5&;$116 ! 4)($&4$8/+ 4-5",$-+J&P1#:$,2#,.&M255J&Q)1*- <",#:6=$>)&#-(($,,21.&780&(2(921 ! H5)+&<#F$+1/56 ! =+%8>/?&@$1)1/#3$& !"#2+264-51&#-(($,,21.&&780&(2(921 !"#2+26<",#:6?)%--@&#-(($,,21.&780& (2(921& ! B08$9&;-9,/,,/86&C=%D$9-8E ! A-"*&B",,$+*6&C=%D$9-8E !"#2+264-51&#-(($,,21.&&780&(2(921 012),-1&-3&!"#2+2.&<",#:&A&?)%--@ 82(921&P@)#:2&4-3,N)12&Q-"+%),$-+ B&CDDE;CDDF © 2008-2009 !"#$%&'()*$+),$-+.&'+#/ Lucid Imagination, Inc. 47
  • 48. !"#$%&'()*$+),$-+&."/$+0//&1-%02 ;:00 <-=+2-)% ()*+,-,./+"0+,/.1) 2+,*.3.+4"5./*,.67*.1)/ & 8,++"& 3)2"04)%%&567 !"#0+0 89*:)%0 >9)#?0@-:* 2199+,:.;<""=7--1,*>" ?,;.).)@>" 21)/7<*.)@" !"#$$%&#$$' © 2008-2009 A7:.4"B9;@.);*.1) 21)3.4+)*.;< !"#$%$&'()*+',%-'./$0+'*)1)2',+$'.+,-$3,+42')5'./$'67,#/$'()5.8,+$'9)"%-,.0)% Lucid Imagination, Inc. 48
  • 49. Thank you http://www.lucidimagination.com © 2008-2009 Lucid Imagination, Inc. 49
  • 50. © 2008-2009 Lucid Imagination, Inc. 50