SlideShare a Scribd company logo
Rapid Prototyping
           with Solr

          Erik Hatcher, Lucid Imagination
erik.hatcher @ lucidimagination.com, May 25, 2011
Abstract
§  Got data? Let's make it searchable! This interactive
    presentation will demonstrate getting documents into
    Solr quickly, will provide some tips in adjusting Solr's
    schema to match your needs better, and finally will
    discuss how showcase your data in a flexible search
    user interface. We'll see how to rapidly leverage
    faceting, highlighting, spell checking, and debugging.
    Even after all that, there will be enough time left to
    outline the next steps in developing your search
    application and taking it to production.




                                                               3
My Background
§  Erik Hatcher
   •  Lucid Imagination
      §  Technical Staff
   •  Co-author
      §  Java Development with Ant / Ant in Action (Manning)
      §  Lucene in Action (Manning)
   •  Apache Software Foundation
      §  Committer – Lucene / Solr
      §  PMC – Lucene TLP
      §  Member




                                                                4
Why prototype?
§  Demonstrate Solr can handle your data and
    searching needs; mitigate risk, learn the
    unknown
§  It’s quick and easy, with very little time
    investment
§  Immediate functional user interface impresses
    decision makers and target users;
    get buy-in
  •  The user interface IS the app



                                                    5
Prior Art
§  Hoss’ amazing ISFDB work
   •  http://www.lucidimagination.com/blog/tag/isfdb/
§  Previous “Rapid Prototyping with Solr” presentations
   •  Data.gov Catalog on Solr:
      http://www.lucidimagination.com/blog/2010/11/05/data-gov-
      on-solr/
   •  Rich text files on Solr:
      http://www.lucidimagination.com/Community/Hear-from-
      the-Experts/Podcasts-and-Videos/Rapid-Prototyping-
      Search-Applications-Solr
   •  CSV (conference attendee data) on Solr:
      http://www.slideshare.net/erikhatcher/rapid-prototyping-
      with-solr-4312681



                                                                  6
Rapid Prototyping using CSV
§  Fired up Solr’s example configuration
§  /update/csv
   •  http://localhost:8983/solr/update/csv?
      commit=true&stream.file=EuroCon2010.csv&fieldnames=fi
      rst,last,company,title,country&header=true&f.country.map
      =Great+Britain:United+Kingdom
§  Tweak configuration
   •  schema: domain-centric field names
   •  solrconfig: /browse request handler
   •  Template adjustments
§  Instant classic search results view, tree map
    visualization of facet data, and random selection of
    contest winners

                                                                 7
CSV results




              8
… using rich text files
§  curl "http://localhost:8983 /solr/update/extract?
    stream.file=/docs/file.pdf &literal.id=/docs/file.pdf




                                                            9
… using Data.Gov catalog data
§  /update/csv – again!




                                 10
Explaining




             11
Suggest




          12
Venn Viz




           13
E-commerce data
§  http://bbyopen.com/
§  Product data, via easy HTTP JSON API




                                           14
Ingesting the data
require 'solr’!
#...!
1.upto(max_pages) do |page|!
  puts "Processing page #{page}"!
  json = fetch_page(page)!
  !
  response = JSON.parse(json, :symbolize_names=>true)!
  puts "Total products: #{response[:total]}" if page == 1!
!
  mapping = {!
     :id           => :sku,!
     :name_t       => :name,!
     :thumbnail_s => :thumbnailImage,!
     :url_s        => :url,!
     :type_s       => :type,!
     :category_s   => Proc.new {|prod| !
                        prod[:categoryPath].collect {|cat| cat[:name]}.join(' >> ')},!
     :department_s => :department,!
     :class_s      => :class,!
     :subclass_s   => :subclass,!
     :sale_price_f => :salePrice!
  }!
!
  Solr::Indexer.new(response[:products], mapping, !
                     {:debug => debug, :buffer_docs => 500}).index!
end!



                                                                                         15
solr-ruby’s secret power
§  Solr::Indexer.new(
        source, mapping, options
    ).index
§  “Quacks like a duck”
§  source simply #each’s
§  mapping simply #[]’s




                                   16
… on Prism




             17
What is Prism?
§  Yet another opinionated brainstorm from Erik
§  https://github.com/lucidimagination/Prism
§  Under the covers
    •  Ruby
        §  because it’s beautiful
    •  Sinatra
        §  to be lightweight and have elegant flexible routing
    •  Velocity
        §  because it is easy to learn and use, and has powerful features, facilitates
            edit/refresh work
§  Separate from Solr, Rack-savvy, allows easy coding of new routes
    and capabilities
§  Designed to work with any arbitrary Solr instance, and already has
    some basic LucidWorks Enterprise capability
§  Totally a proof-of-concept at this point – just a quick hack

                                                                                          18
… on Solritas




                19
Solritas?
§  Pronounced: so-LAIR-uh-toss
§  Celeritas is a Latin word, translated as "swiftness" or
    "speed". It is often given as the origin of the symbol c,
    the universal notation for the speed of light - http://
    en.wikipedia.org/wiki/Celeritas
§  Technically it’s the VelocityResponseWriter
    (wt=velocity)
   •  simply passes the Solr response through the Apache
      Velocity templating engine
§  http://wiki.apache.org/solr/VelocityResponseWriter
§  Built into Solr, available instantly out of the box at:
    http://localhost:8983/solr/browse

                                                                20
… on Blacklight




                  21
Blacklight?
§  http://projectblacklight.org/
§  Blacklight is a free and open source Ruby on Rails based
    discovery interface (a.k.a. “next-generation catalog”) especially
    optimized for heterogeneous collections. You can use it as a library
    catalog, as a front end for a digital repository, or as a single-search
    interface to aggregate digital content that would otherwise be
    siloed.
§  Production sites:
       •  http://search.lib.virginia.edu/
       •  http://searchworks.stanford.edu/
§    Features:
       •  Authentication
       •  Saved searches
       •  Bookmarks – saved result items
       •  Selected items – for exporting to 3rd party systems
       •  Customizable / extensible UI

                                                                              22
Prototyping Tips and Tools
§  Get data into Solr in the simplest possible way
    •  CSV – if it fits, it’s really nice
§  Schema adjusting
    •  <dynamicField name="*" type="string" multiValued="true"/>
    •  <copyField source="*" dest="text"/>
§  Data analysis
    •  Understand what Solr is doing with your fields
    •  Solr’s Schema Browser and /admin/luke request handler
§  UI
    •  /browse – easy tweaking of <solr-home>/conf/velocity/*.vm
       templates




                                                                   23
Now what?
§  Script the indexing process: full and
    incremental/delta
§  Work with real users on real needs
§  Integrate into production systems
§  Iterate on schema enhancements and
    configuration tweaks
§  Deploy to staging/production environments and
    work at scale: collection size, real queries and
    volume, hardware and JVM settings


                                                       24
Test
§    Performance
§    Scalability
§    Relevance
§    Automate all of the above, start baselines,
      avoid regressions




                                                    25
Thanks!




          26

More Related Content

PDF
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
PDF
Solr Flair: Search User Interfaces Powered by Apache Solr
PDF
Managing Content in Drupal with Workbench
PDF
Introduction to CouchDB
PDF
Software Development with Open Source
KEY
Intro to Apache Solr for Drupal
PDF
Padrino - the Godfather of Sinatra
PDF
OpenERP and Perl
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr
Managing Content in Drupal with Workbench
Introduction to CouchDB
Software Development with Open Source
Intro to Apache Solr for Drupal
Padrino - the Godfather of Sinatra
OpenERP and Perl

What's hot (19)

PDF
Drupal + ApacheSolr
KEY
RESTful Api practices Rails 3
PPTX
Introduction to Redis
PPT
The things we found in your website
PDF
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
PPTX
Saving Time with WP-CLI
PDF
My site is slow
PDF
Perl in the Real World
PDF
PLAT-16 Using Enterprise Content in Grails
PPT
Jasig rubyon rails
PDF
Modernizing WordPress Search with Elasticsearch
PDF
Building web framework with Rack
PDF
Ruby w/o Rails (Олександр Сімонов)
KEY
Supa fast Ruby + Rails
PPTX
Drupal Camp Melbourne
ZIP
Rails 3 (beta) Roundup
PPT
Simplify your integrations with Apache Camel
KEY
Asset Pipeline
PDF
Rails Girls: Programming, Web Applications and Ruby on Rails
Drupal + ApacheSolr
RESTful Api practices Rails 3
Introduction to Redis
The things we found in your website
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
Saving Time with WP-CLI
My site is slow
Perl in the Real World
PLAT-16 Using Enterprise Content in Grails
Jasig rubyon rails
Modernizing WordPress Search with Elasticsearch
Building web framework with Rack
Ruby w/o Rails (Олександр Сімонов)
Supa fast Ruby + Rails
Drupal Camp Melbourne
Rails 3 (beta) Roundup
Simplify your integrations with Apache Camel
Asset Pipeline
Rails Girls: Programming, Web Applications and Ruby on Rails
Ad

Viewers also liked (20)

PDF
Shining new light on lucene solr performance and monitoring
PPT
Solr Cluster installation tool "Anuenue"
PPTX
Creating Custom Finishes
PDF
Tate Tyler - Designing the Search Experience
PPTX
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
PPTX
Maroon5
PDF
Practical Search with Solr: Beyond just Looking it Up
PDF
корея
PDF
What’s new in apache solr 1.4
PPT
Artist Update8 11
PPTX
I love you mommy
PPT
Tennis
PPTX
PDF
What’s New in Apache Lucene 2.9
PDF
How The Guardian Embraced the Internet using Content, Search, and Open Source
PPTX
Crazy
PDF
Using Solr to find the Right Person for the Right Job
PDF
Impact of open source search on the intelligence community
PPT
Spanish bombss
PPT
Jonh Lennon
Shining new light on lucene solr performance and monitoring
Solr Cluster installation tool "Anuenue"
Creating Custom Finishes
Tate Tyler - Designing the Search Experience
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Maroon5
Practical Search with Solr: Beyond just Looking it Up
корея
What’s new in apache solr 1.4
Artist Update8 11
I love you mommy
Tennis
What’s New in Apache Lucene 2.9
How The Guardian Embraced the Internet using Content, Search, and Open Source
Crazy
Using Solr to find the Right Person for the Right Job
Impact of open source search on the intelligence community
Spanish bombss
Jonh Lennon
Ad

Similar to Rapid Prototyping with Solr (20)

PDF
Rapid Prototyping with Solr
PDF
Rapid Prototyping with Solr
PDF
Migrating Fast to Solr
PDF
Solr Powered Lucene
PDF
Rapid Prototyping with Solr
PDF
Introduction to Solr
PDF
Solr Flair
PDF
Solr Recipes
PDF
NoSQL, Apache SOLR and Apache Hadoop
PDF
Lucene Case Studies ApacheCon EU 2009
PDF
Lucene for Solr Developers
PDF
Find it, possibly also near you!
PDF
Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl
PDF
Integrating the Solr search engine
PDF
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
PDF
Introduction to Solr
PDF
Needle in an enterprise haystack
PDF
Introduction to Solr
KEY
Solr 101
PDF
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Rapid Prototyping with Solr
Rapid Prototyping with Solr
Migrating Fast to Solr
Solr Powered Lucene
Rapid Prototyping with Solr
Introduction to Solr
Solr Flair
Solr Recipes
NoSQL, Apache SOLR and Apache Hadoop
Lucene Case Studies ApacheCon EU 2009
Lucene for Solr Developers
Find it, possibly also near you!
Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl
Integrating the Solr search engine
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Introduction to Solr
Needle in an enterprise haystack
Introduction to Solr
Solr 101
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever

More from Lucidworks (Archived) (20)

PDF
Integrating Hadoop & Solr
PDF
The Data-Driven Paradigm
PDF
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
PDF
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
PPTX
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
PPTX
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
PPTX
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
PPTX
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
PPTX
What's new in solr june 2014
PPTX
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
PPTX
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
PPTX
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
PDF
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
PDF
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
PDF
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
PPTX
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
PPTX
Solr At AOL, Presented by Sean Timm at SolrExchage DC
PPTX
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
PPTX
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
PPTX
Building a data driven search application with LucidWorks SiLK
Integrating Hadoop & Solr
The Data-Driven Paradigm
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
What's new in solr june 2014
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Building a data driven search application with LucidWorks SiLK

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
KodekX | Application Modernization Development
PPTX
A Presentation on Artificial Intelligence
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Electronic commerce courselecture one. Pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Diabetes mellitus diagnosis method based random forest with bat algorithm
KodekX | Application Modernization Development
A Presentation on Artificial Intelligence
Review of recent advances in non-invasive hemoglobin estimation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
20250228 LYD VKU AI Blended-Learning.pptx
Network Security Unit 5.pdf for BCA BBA.
Advanced methodologies resolving dimensionality complications for autism neur...
Electronic commerce courselecture one. Pdf
NewMind AI Weekly Chronicles - August'25 Week I
Agricultural_Statistics_at_a_Glance_2022_0.pdf
NewMind AI Monthly Chronicles - July 2025
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation_ Review paper, used for researhc scholars
Building Integrated photovoltaic BIPV_UPV.pdf
MYSQL Presentation for SQL database connectivity
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

Rapid Prototyping with Solr

  • 1. Rapid Prototyping with Solr Erik Hatcher, Lucid Imagination erik.hatcher @ lucidimagination.com, May 25, 2011
  • 2. Abstract §  Got data? Let's make it searchable! This interactive presentation will demonstrate getting documents into Solr quickly, will provide some tips in adjusting Solr's schema to match your needs better, and finally will discuss how showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production. 3
  • 3. My Background §  Erik Hatcher •  Lucid Imagination §  Technical Staff •  Co-author §  Java Development with Ant / Ant in Action (Manning) §  Lucene in Action (Manning) •  Apache Software Foundation §  Committer – Lucene / Solr §  PMC – Lucene TLP §  Member 4
  • 4. Why prototype? §  Demonstrate Solr can handle your data and searching needs; mitigate risk, learn the unknown §  It’s quick and easy, with very little time investment §  Immediate functional user interface impresses decision makers and target users; get buy-in •  The user interface IS the app 5
  • 5. Prior Art §  Hoss’ amazing ISFDB work •  http://www.lucidimagination.com/blog/tag/isfdb/ §  Previous “Rapid Prototyping with Solr” presentations •  Data.gov Catalog on Solr: http://www.lucidimagination.com/blog/2010/11/05/data-gov- on-solr/ •  Rich text files on Solr: http://www.lucidimagination.com/Community/Hear-from- the-Experts/Podcasts-and-Videos/Rapid-Prototyping- Search-Applications-Solr •  CSV (conference attendee data) on Solr: http://www.slideshare.net/erikhatcher/rapid-prototyping- with-solr-4312681 6
  • 6. Rapid Prototyping using CSV §  Fired up Solr’s example configuration §  /update/csv •  http://localhost:8983/solr/update/csv? commit=true&stream.file=EuroCon2010.csv&fieldnames=fi rst,last,company,title,country&header=true&f.country.map =Great+Britain:United+Kingdom §  Tweak configuration •  schema: domain-centric field names •  solrconfig: /browse request handler •  Template adjustments §  Instant classic search results view, tree map visualization of facet data, and random selection of contest winners 7
  • 8. … using rich text files §  curl "http://localhost:8983 /solr/update/extract? stream.file=/docs/file.pdf &literal.id=/docs/file.pdf 9
  • 9. … using Data.Gov catalog data §  /update/csv – again! 10
  • 11. Suggest 12
  • 12. Venn Viz 13
  • 14. Ingesting the data require 'solr’! #...! 1.upto(max_pages) do |page|! puts "Processing page #{page}"! json = fetch_page(page)! ! response = JSON.parse(json, :symbolize_names=>true)! puts "Total products: #{response[:total]}" if page == 1! ! mapping = {! :id => :sku,! :name_t => :name,! :thumbnail_s => :thumbnailImage,! :url_s => :url,! :type_s => :type,! :category_s => Proc.new {|prod| ! prod[:categoryPath].collect {|cat| cat[:name]}.join(' >> ')},! :department_s => :department,! :class_s => :class,! :subclass_s => :subclass,! :sale_price_f => :salePrice! }! ! Solr::Indexer.new(response[:products], mapping, ! {:debug => debug, :buffer_docs => 500}).index! end! 15
  • 15. solr-ruby’s secret power §  Solr::Indexer.new( source, mapping, options ).index §  “Quacks like a duck” §  source simply #each’s §  mapping simply #[]’s 16
  • 17. What is Prism? §  Yet another opinionated brainstorm from Erik §  https://github.com/lucidimagination/Prism §  Under the covers •  Ruby §  because it’s beautiful •  Sinatra §  to be lightweight and have elegant flexible routing •  Velocity §  because it is easy to learn and use, and has powerful features, facilitates edit/refresh work §  Separate from Solr, Rack-savvy, allows easy coding of new routes and capabilities §  Designed to work with any arbitrary Solr instance, and already has some basic LucidWorks Enterprise capability §  Totally a proof-of-concept at this point – just a quick hack 18
  • 19. Solritas? §  Pronounced: so-LAIR-uh-toss §  Celeritas is a Latin word, translated as "swiftness" or "speed". It is often given as the origin of the symbol c, the universal notation for the speed of light - http:// en.wikipedia.org/wiki/Celeritas §  Technically it’s the VelocityResponseWriter (wt=velocity) •  simply passes the Solr response through the Apache Velocity templating engine §  http://wiki.apache.org/solr/VelocityResponseWriter §  Built into Solr, available instantly out of the box at: http://localhost:8983/solr/browse 20
  • 21. Blacklight? §  http://projectblacklight.org/ §  Blacklight is a free and open source Ruby on Rails based discovery interface (a.k.a. “next-generation catalog”) especially optimized for heterogeneous collections. You can use it as a library catalog, as a front end for a digital repository, or as a single-search interface to aggregate digital content that would otherwise be siloed. §  Production sites: •  http://search.lib.virginia.edu/ •  http://searchworks.stanford.edu/ §  Features: •  Authentication •  Saved searches •  Bookmarks – saved result items •  Selected items – for exporting to 3rd party systems •  Customizable / extensible UI 22
  • 22. Prototyping Tips and Tools §  Get data into Solr in the simplest possible way •  CSV – if it fits, it’s really nice §  Schema adjusting •  <dynamicField name="*" type="string" multiValued="true"/> •  <copyField source="*" dest="text"/> §  Data analysis •  Understand what Solr is doing with your fields •  Solr’s Schema Browser and /admin/luke request handler §  UI •  /browse – easy tweaking of <solr-home>/conf/velocity/*.vm templates 23
  • 23. Now what? §  Script the indexing process: full and incremental/delta §  Work with real users on real needs §  Integrate into production systems §  Iterate on schema enhancements and configuration tweaks §  Deploy to staging/production environments and work at scale: collection size, real queries and volume, hardware and JVM settings 24
  • 24. Test §  Performance §  Scalability §  Relevance §  Automate all of the above, start baselines, avoid regressions 25
  • 25. Thanks! 26