SlideShare a Scribd company logo
Solr 4
                   Presented by Erik Hatcher




© Copyright 2012
About: Erik Hatcher

    • “Lucene in Action”, co-author
       -  And also “Java Development with Ant”/”Ant in Action” co-author
    • Open Source
       -  Apache Software Foundation: member, Lucene/Solr committer
          and PMC
       -  Originator of “Blacklight”, a Solr-powered discovery interface
    • LucidWorks
       -  Co-founder
       -  Recently renamed from Lucid Imagination
       -  Customer Support




    © 2012 LucidWorks
2
Abstract

    Solr 4.0 dramatically improves scalability, performance,
    and flexibility. An overhauled Lucene underneath sports near
    real-time (NRT) capabilities allowing indexed documents to
    be rapidly visible and searchable. Lucene’s improvements
    also include pluggable scoring, much faster fuzzy and
    wildcard querying, and vastly improved memory usage.
    These Lucene improvements automatically make Solr much
    better, and Solr magnifies these advances with “SolrCloud.”
    SolrCloud enables highly available and fault tolerant clusters
    for large scale distributed indexing and searching. There are
    many other changes that will be surveyed as well. This talk
    will cover these improvements in detail, comparing and
    contrasting to previous versions of Solr.

    © 2012 LucidWorks
3
Lucene 4 Improvements

    • Flexible index formats
    • Pluggable scoring
    • String -> BytesRef
    • DWPT (Document Writer Per Thread)
       -  faster, more consistent indexing speed
    • NRT (Near Real-Time)
    • Spatial overhaul
    • FST/FSA
       -  FuzzyQuery over 100x faster
       -  also reduces memory footprint for Terms index
    • DocValues: aka column-stride fields

    © 2012 LucidWorks
4
Flexible index formats

    • For terms, postings lists, stored fields, term vectors, etc
    • Several new posting list codecs
       -  Pulsing (inlines low doc freq)
       -  Block (packed int blocks)
       -  SimpleText (debugging, transparency)
       -  Bloom (experimental, also inlines low doc freq)
       -  Appending (for append-only filesystems such as HDFS)
       -  Memory (terms as FST)




    © 2012 LucidWorks
5
Pluggable scoring

    • Decoupled from traditional vector space (TF/IDF)
    • Additional index statistics
       -  number of tokens for a term or field
       -  number of postings for a field
       -  number of documents with a posting for a field
    • Several built-in alternatives:
       -  BM25
       -  DFR – divergence from randomness
       -  Information-based models
    • “norms” are no longer limited to a single byte
       -  Similarity implementations can use any DocValues type to store
          norms


    © 2012 LucidWorks
6
String -> BytesRef

    • How many bytes does a Java String require?
       -  BytesRef is now used to avoid this overhead
       -  Think of the internal structure as a big buffer with pointers
    • Garbage collection much more efficient
       -  big blocks rather than zillions of small ones
    • How much reduction? 10%? 20%?
       -  No. Way more than that




    © 2012 LucidWorks
7
NRT: Near Real-Time

    • Per-segment
       -  FieldCache needs to only load from new segments
    • Soft commit
       -  Faster: does not fsync
       -  Can soft commit very rapidly, as low as every second




    © 2012 LucidWorks
8
Lucene 4: there’s more

    • AutomatonQuery
       -  term matching a provided finite-state automaton
    • Term offsets
       -  optionally encoded into the postings lists and can be retrieved
          per-position
    • DirectSpellChecker
       -  finds possible corrections directly against the main search index
          without requiring a separate index
    • DWPT
       -  Flushing new segment is now concurrent w/ indexing




    © 2012 LucidWorks
9
Indexing performance (Wikipedia 4KB docs)

     • http://people.apache.org/~mikemccand/lucenebench/
       indexing.html




     © 2012 LucidWorks
10
QPS (primary key lookup)

     • http://people.apache.org/~mikemccand/lucenebench/
       PKLookup.html




     © 2012 LucidWorks
11
FuzzyQuery

     • http://people.apache.org/~mikemccand/lucenebench/
       Fuzzy2.html




     © 2012 LucidWorks
12
Solr 4 Highlights

     •  SolrJ streaming response
     •  Pivot facets
     •  New relevancy function queries
        -  termfreq, tf, docfreq, idf norm, maxdoc, numdocs, exists, if, and, or,
           xor, not, def, and true and false constants
     •  DirectSpellChecker support
     •  Improved document response: DocTransformer, function
        calculations
     •  Pseudo-join
     •  New admin UI: Including SolrCloud cluster visualizations
     •  Transaction log
     •  Several new update processors, including a “script” one
     •  Spatial overhaul
     •  Content-type savvy /update handler
     •  SolrCloud

     © 2012 LucidWorks
13
Per-segment faceting improvement

     • Field-cache, per segment
        -  Test index: 10M documents, 18 segments, single valued field
     • facet.method=fcs
     • Result set=100 docs, 100,000 unique terms
        -  static index fc=3ms fcs=244 ms
        -  quickly changing index fc=1388 ms, fcs=267 ms
     • Result set=1,000,000 docs, 100 unique terms
        -  static index fc=26 ms fcs=34 ms
        -  quickly changing index fc=741 ms, fcs=94 ms
     • Data from Yonik’s Lucene Revolution 2011 faceting talk



     © 2012 LucidWorks
14
Solr 3.x scalability

     • Capabilities:
        -  Replication
        -  Distributed search
     • Limitations:
        -  Documents only available after (expensive) “hard” commit,
           replication, and warming delays
        -  Configuration labor intensive, manually maintained and
           coordinated
        -  Manual sharding: no automatic distributed indexing
        -  Failure recovery difficult if master goes down




     © 2012 LucidWorks
15
SolrCloud: Solr 4’s scalability

     • Sharded leaders and replicas
     • ZooKeeper used for cluster management
     • Distributed indexing
        -  Automatically distributes updates to appropriate shard
        -  Facilitates Near Real-Time (NRT) searching
     • Distributed search
        -  Automatically distributes to nodes of each shard
     • Robust, automatic update recovery
     • Real-time /get
        -  Leverages transaction log
     • No single point of failure
     • Large scale NRT using soft commits

     © 2012 LucidWorks
16
SolrCloud details

     • “Leaders” and “replicas”
        -  Leaders are automatically elected
     • Leaders are just a replica with some coordination
       responsibilities for the associated replicas
     • If a leader goes down, one of the associated replicas is
       elected as the new leader
     • New nodes are automatically assigned a shard and
       role, and replicate/recover as needed
     • CloudSolrServer
     • Replication in Solr 4
        -  Used for new and recovering replicas
        -  Or for traditional master/slave configuration

     © 2012 LucidWorks
17
NoSQL

     • Update durability
        -  A transaction log ensures that even uncommitted documents are
           never lost.
     • Real-time Get
        -  The ability to quickly retrieve the latest version of a document,
           without the need to commit or open a new searcher
     • Versioning and Optimistic Locking
        -  combined with real-time get, this allows read-update-write
           functionality that ensures no conflicting changes were made
           concurrently by other clients.
     • Atomic updates
        -  the ability to add, remove, change, and increment fields of an
           existing document without having to send in the complete
           document again.


     © 2012 LucidWorks
18
Some numbers

     • On a Wikipedia index (11M documents)
        -  Time to perform the first query with sorting (no warmup queries)
           Solr 3x: 13 seconds, Solr 4: 6 seconds.
        -  Memory consumption Solr 3x: 1,040M, Solr 4: 366M. Yes,
           almost a 2/3 reduction in memory use. And that’s the entire
           program size, not counting memory used to just start Solr and
           Jetty running.
        -  Number of objects on the heap. Solr 3x: 19.4M, Solr 4: 80K. No,
           that’s not a typo. There are over two orders of magnitude fewer
           objects on the heap in trunk!
     • From an Erick Erickson blog entry (see Links slide)




     © 2012 LucidWorks
19
Links

     • Lucene/Solr: lucene.apache.org
     • “Lucene in Action”: www.manning.com/lucene
     • Blacklight
        -  projectblacklight.org
        -  Examples: search.lib.virginia.edu and searchworks.stanford.edu
     • SearchHub.org
        -  Community/public content
        -  http://searchhub.org/dev/2012/04/06/memory-comparisons-
           between-solr-3x-and-trunk/




     © 2012 LucidWorks
20
About LucidWorks

     • LucidWorks Search
        -  Lucene/Solr 4 powered
        -  On-premise or hosted (Amazon EC2 and Azure)
        -  Rich connector framework for SharePoint, web crawling, etc
        -  Built-in security support
     • LucidWorks Big Data
        -  Scalable classification, machine learning, analytics
     • Lucene/Solr commercial support
     • Consulting
     • Training
     • http://www.lucidworks.com


     © 2012 LucidWorks
21

More Related Content

PDF
Lucene for Solr Developers
PDF
Rapid Prototyping with Solr
PDF
Lucene's Latest (for Libraries)
PDF
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
PDF
Solr Recipes
PDF
Rapid Prototyping with Solr
PDF
Solr Black Belt Pre-conference
PDF
Solr Recipes Workshop
Lucene for Solr Developers
Rapid Prototyping with Solr
Lucene's Latest (for Libraries)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
Solr Recipes
Rapid Prototyping with Solr
Solr Black Belt Pre-conference
Solr Recipes Workshop

What's hot (20)

PDF
Rapid Prototyping with Solr
PDF
Lucene for Solr Developers
PDF
Lucene for Solr Developers
PDF
Integrating the Solr search engine
PDF
Apache Solr crash course
PDF
Solr Application Development Tutorial
PDF
Introduction to Solr
PDF
Apache Solr Workshop
PDF
New-Age Search through Apache Solr
PDF
Introduction to Solr
PDF
Solr Query Parsing
PDF
Solr Powered Lucene
PDF
Apache Solr! Enterprise Search Solutions at your Fingertips!
PDF
Building your own search engine with Apache Solr
PPTX
Introduction to Lucene & Solr and Usecases
PPTX
Battle of the giants: Apache Solr vs ElasticSearch
PPT
Building Intelligent Search Applications with Apache Solr and PHP5
PDF
Introduction to Apache Solr
PDF
What's New in Solr 3.x / 4.0
PPTX
Hacking Lucene for Custom Search Results
Rapid Prototyping with Solr
Lucene for Solr Developers
Lucene for Solr Developers
Integrating the Solr search engine
Apache Solr crash course
Solr Application Development Tutorial
Introduction to Solr
Apache Solr Workshop
New-Age Search through Apache Solr
Introduction to Solr
Solr Query Parsing
Solr Powered Lucene
Apache Solr! Enterprise Search Solutions at your Fingertips!
Building your own search engine with Apache Solr
Introduction to Lucene & Solr and Usecases
Battle of the giants: Apache Solr vs ElasticSearch
Building Intelligent Search Applications with Apache Solr and PHP5
Introduction to Apache Solr
What's New in Solr 3.x / 4.0
Hacking Lucene for Custom Search Results
Ad

Viewers also liked (20)

PDF
Call me maybe: Jepsen and flaky networks
PPTX
Сергей Моренец: "Gradle. Write once, build everywhere"
PDF
Solr Masterclass Bangkok, June 2014
PPTX
Gimme shelter: Tips on protecting proprietary and open source code
PDF
Solr Powered Libraries
PDF
Meet Solr For The Tirst Again
PPTX
Solr 6 Feature Preview
PDF
Solr Indexing and Analysis Tricks
PPT
Faceted Search – the 120 Million Documents Story
PDF
"Solr Update" at code4lib '13 - Chicago
ODP
Introduction to Apache Solr
PDF
Why I want to Kazan
PPTX
Hackathon
PDF
Multi faceted responsive search, autocomplete, feeds engine & logging
PDF
Apache Solr Changes the Way You Build Sites
PPTX
Open source applied: Real-world uses
PPTX
Solr introduction
PPTX
How to achieve security, reliability, and productivity in less time
PDF
Top Node.js Metrics to Watch
PDF
Faceted Search And Result Reordering
Call me maybe: Jepsen and flaky networks
Сергей Моренец: "Gradle. Write once, build everywhere"
Solr Masterclass Bangkok, June 2014
Gimme shelter: Tips on protecting proprietary and open source code
Solr Powered Libraries
Meet Solr For The Tirst Again
Solr 6 Feature Preview
Solr Indexing and Analysis Tricks
Faceted Search – the 120 Million Documents Story
"Solr Update" at code4lib '13 - Chicago
Introduction to Apache Solr
Why I want to Kazan
Hackathon
Multi faceted responsive search, autocomplete, feeds engine & logging
Apache Solr Changes the Way You Build Sites
Open source applied: Real-world uses
Solr introduction
How to achieve security, reliability, and productivity in less time
Top Node.js Metrics to Watch
Faceted Search And Result Reordering
Ad

Similar to Solr 4 (20)

PPTX
What's new in Lucene and Solr 4.x
PPTX
Data IO: Next Generation Search with Lucene and Solr 4
PDF
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
PDF
KEYNOTE: Lucene / Solr road map
PPTX
Open Source Search FTW
PPTX
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
PDF
Inside Solr 5 - Bangalore Solr/Lucene Meetup
PDF
Webinar: Inside Apache Solr 5
PDF
Data Science with Solr and Spark
PDF
Integrating Hadoop & Solr
PDF
Getting started faster with LucidWorks for Solr
PDF
Building Lanyrd
KEY
Solr 101
PDF
Suche mit Apache Lucene & Co.
PDF
Overview of Searching in Solr 1.4
PDF
What’s new in apache solr 1.4
PDF
What’s New in Solr 1.4
PPTX
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
PPTX
What's new in solr june 2014
PDF
Solr for Data Science
What's new in Lucene and Solr 4.x
Data IO: Next Generation Search with Lucene and Solr 4
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
KEYNOTE: Lucene / Solr road map
Open Source Search FTW
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Webinar: Inside Apache Solr 5
Data Science with Solr and Spark
Integrating Hadoop & Solr
Getting started faster with LucidWorks for Solr
Building Lanyrd
Solr 101
Suche mit Apache Lucene & Co.
Overview of Searching in Solr 1.4
What’s new in apache solr 1.4
What’s New in Solr 1.4
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
What's new in solr june 2014
Solr for Data Science

More from Erik Hatcher (10)

PDF
Ted Talk
PDF
Solr Payloads
PDF
it's just search
PDF
Query Parsing - Tips and Tricks
PDF
Solr Flair
PDF
Introduction to Solr
PDF
Lucene for Solr Developers
PDF
Rapid Prototyping with Solr
PDF
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
PDF
Solr Flair: Search User Interfaces Powered by Apache Solr
Ted Talk
Solr Payloads
it's just search
Query Parsing - Tips and Tricks
Solr Flair
Introduction to Solr
Lucene for Solr Developers
Rapid Prototyping with Solr
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPT
Teaching material agriculture food technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Electronic commerce courselecture one. Pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation theory and applications.pdf
PDF
KodekX | Application Modernization Development
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
A Presentation on Artificial Intelligence
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Empathic Computing: Creating Shared Understanding
Encapsulation_ Review paper, used for researhc scholars
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Per capita expenditure prediction using model stacking based on satellite ima...
Understanding_Digital_Forensics_Presentation.pptx
Review of recent advances in non-invasive hemoglobin estimation
Reach Out and Touch Someone: Haptics and Empathic Computing
Digital-Transformation-Roadmap-for-Companies.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Teaching material agriculture food technology
Spectral efficient network and resource selection model in 5G networks
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Electronic commerce courselecture one. Pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation theory and applications.pdf
KodekX | Application Modernization Development
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Chapter 3 Spatial Domain Image Processing.pdf
A Presentation on Artificial Intelligence

Solr 4

  • 1. Solr 4 Presented by Erik Hatcher © Copyright 2012
  • 2. About: Erik Hatcher • “Lucene in Action”, co-author -  And also “Java Development with Ant”/”Ant in Action” co-author • Open Source -  Apache Software Foundation: member, Lucene/Solr committer and PMC -  Originator of “Blacklight”, a Solr-powered discovery interface • LucidWorks -  Co-founder -  Recently renamed from Lucid Imagination -  Customer Support © 2012 LucidWorks 2
  • 3. Abstract Solr 4.0 dramatically improves scalability, performance, and flexibility. An overhauled Lucene underneath sports near real-time (NRT) capabilities allowing indexed documents to be rapidly visible and searchable. Lucene’s improvements also include pluggable scoring, much faster fuzzy and wildcard querying, and vastly improved memory usage. These Lucene improvements automatically make Solr much better, and Solr magnifies these advances with “SolrCloud.” SolrCloud enables highly available and fault tolerant clusters for large scale distributed indexing and searching. There are many other changes that will be surveyed as well. This talk will cover these improvements in detail, comparing and contrasting to previous versions of Solr. © 2012 LucidWorks 3
  • 4. Lucene 4 Improvements • Flexible index formats • Pluggable scoring • String -> BytesRef • DWPT (Document Writer Per Thread) -  faster, more consistent indexing speed • NRT (Near Real-Time) • Spatial overhaul • FST/FSA -  FuzzyQuery over 100x faster -  also reduces memory footprint for Terms index • DocValues: aka column-stride fields © 2012 LucidWorks 4
  • 5. Flexible index formats • For terms, postings lists, stored fields, term vectors, etc • Several new posting list codecs -  Pulsing (inlines low doc freq) -  Block (packed int blocks) -  SimpleText (debugging, transparency) -  Bloom (experimental, also inlines low doc freq) -  Appending (for append-only filesystems such as HDFS) -  Memory (terms as FST) © 2012 LucidWorks 5
  • 6. Pluggable scoring • Decoupled from traditional vector space (TF/IDF) • Additional index statistics -  number of tokens for a term or field -  number of postings for a field -  number of documents with a posting for a field • Several built-in alternatives: -  BM25 -  DFR – divergence from randomness -  Information-based models • “norms” are no longer limited to a single byte -  Similarity implementations can use any DocValues type to store norms © 2012 LucidWorks 6
  • 7. String -> BytesRef • How many bytes does a Java String require? -  BytesRef is now used to avoid this overhead -  Think of the internal structure as a big buffer with pointers • Garbage collection much more efficient -  big blocks rather than zillions of small ones • How much reduction? 10%? 20%? -  No. Way more than that © 2012 LucidWorks 7
  • 8. NRT: Near Real-Time • Per-segment -  FieldCache needs to only load from new segments • Soft commit -  Faster: does not fsync -  Can soft commit very rapidly, as low as every second © 2012 LucidWorks 8
  • 9. Lucene 4: there’s more • AutomatonQuery -  term matching a provided finite-state automaton • Term offsets -  optionally encoded into the postings lists and can be retrieved per-position • DirectSpellChecker -  finds possible corrections directly against the main search index without requiring a separate index • DWPT -  Flushing new segment is now concurrent w/ indexing © 2012 LucidWorks 9
  • 10. Indexing performance (Wikipedia 4KB docs) • http://people.apache.org/~mikemccand/lucenebench/ indexing.html © 2012 LucidWorks 10
  • 11. QPS (primary key lookup) • http://people.apache.org/~mikemccand/lucenebench/ PKLookup.html © 2012 LucidWorks 11
  • 12. FuzzyQuery • http://people.apache.org/~mikemccand/lucenebench/ Fuzzy2.html © 2012 LucidWorks 12
  • 13. Solr 4 Highlights •  SolrJ streaming response •  Pivot facets •  New relevancy function queries -  termfreq, tf, docfreq, idf norm, maxdoc, numdocs, exists, if, and, or, xor, not, def, and true and false constants •  DirectSpellChecker support •  Improved document response: DocTransformer, function calculations •  Pseudo-join •  New admin UI: Including SolrCloud cluster visualizations •  Transaction log •  Several new update processors, including a “script” one •  Spatial overhaul •  Content-type savvy /update handler •  SolrCloud © 2012 LucidWorks 13
  • 14. Per-segment faceting improvement • Field-cache, per segment -  Test index: 10M documents, 18 segments, single valued field • facet.method=fcs • Result set=100 docs, 100,000 unique terms -  static index fc=3ms fcs=244 ms -  quickly changing index fc=1388 ms, fcs=267 ms • Result set=1,000,000 docs, 100 unique terms -  static index fc=26 ms fcs=34 ms -  quickly changing index fc=741 ms, fcs=94 ms • Data from Yonik’s Lucene Revolution 2011 faceting talk © 2012 LucidWorks 14
  • 15. Solr 3.x scalability • Capabilities: -  Replication -  Distributed search • Limitations: -  Documents only available after (expensive) “hard” commit, replication, and warming delays -  Configuration labor intensive, manually maintained and coordinated -  Manual sharding: no automatic distributed indexing -  Failure recovery difficult if master goes down © 2012 LucidWorks 15
  • 16. SolrCloud: Solr 4’s scalability • Sharded leaders and replicas • ZooKeeper used for cluster management • Distributed indexing -  Automatically distributes updates to appropriate shard -  Facilitates Near Real-Time (NRT) searching • Distributed search -  Automatically distributes to nodes of each shard • Robust, automatic update recovery • Real-time /get -  Leverages transaction log • No single point of failure • Large scale NRT using soft commits © 2012 LucidWorks 16
  • 17. SolrCloud details • “Leaders” and “replicas” -  Leaders are automatically elected • Leaders are just a replica with some coordination responsibilities for the associated replicas • If a leader goes down, one of the associated replicas is elected as the new leader • New nodes are automatically assigned a shard and role, and replicate/recover as needed • CloudSolrServer • Replication in Solr 4 -  Used for new and recovering replicas -  Or for traditional master/slave configuration © 2012 LucidWorks 17
  • 18. NoSQL • Update durability -  A transaction log ensures that even uncommitted documents are never lost. • Real-time Get -  The ability to quickly retrieve the latest version of a document, without the need to commit or open a new searcher • Versioning and Optimistic Locking -  combined with real-time get, this allows read-update-write functionality that ensures no conflicting changes were made concurrently by other clients. • Atomic updates -  the ability to add, remove, change, and increment fields of an existing document without having to send in the complete document again. © 2012 LucidWorks 18
  • 19. Some numbers • On a Wikipedia index (11M documents) -  Time to perform the first query with sorting (no warmup queries) Solr 3x: 13 seconds, Solr 4: 6 seconds. -  Memory consumption Solr 3x: 1,040M, Solr 4: 366M. Yes, almost a 2/3 reduction in memory use. And that’s the entire program size, not counting memory used to just start Solr and Jetty running. -  Number of objects on the heap. Solr 3x: 19.4M, Solr 4: 80K. No, that’s not a typo. There are over two orders of magnitude fewer objects on the heap in trunk! • From an Erick Erickson blog entry (see Links slide) © 2012 LucidWorks 19
  • 20. Links • Lucene/Solr: lucene.apache.org • “Lucene in Action”: www.manning.com/lucene • Blacklight -  projectblacklight.org -  Examples: search.lib.virginia.edu and searchworks.stanford.edu • SearchHub.org -  Community/public content -  http://searchhub.org/dev/2012/04/06/memory-comparisons- between-solr-3x-and-trunk/ © 2012 LucidWorks 20
  • 21. About LucidWorks • LucidWorks Search -  Lucene/Solr 4 powered -  On-premise or hosted (Amazon EC2 and Azure) -  Rich connector framework for SharePoint, web crawling, etc -  Built-in security support • LucidWorks Big Data -  Scalable classification, machine learning, analytics • Lucene/Solr commercial support • Consulting • Training • http://www.lucidworks.com © 2012 LucidWorks 21